I’m trying to scrape metadata from a https://yellowpages.com.eg/en/category/abrasives using Selenium and BeautifulSoup. I can successfully extract some data, but I’m having trouble getting the text from a tag nested inside a div within a loop. Here’s my current code:
]
pagecount = 1
driver = webdriver.Chrome()
page_url = f"{base_url}/en/category/abrasives/p{pagecount}"
driver.get(page_url)
driver.implicitly_wait(10)
page_source = driver.page_source
time.sleep(1)
bs = BeautifulSoup(page_source, 'html.parser')
divs = bs.find_all('div', class_ = 'col-xs-12 item-details')
for div in divs:
img_tag = div.find('img')
if(img_tag):
img_src = img_tag['data-src']
print(img_src)
else:
# print("i provided no tag be off stupid")
pass
title = div.find('a', class_ = 'item-title').text.strip()
print(title)
address = div.find('a', class_ = 'address-text').find('span').text.strip()
print(address)
# description = div.find('div', class_ = 'item-aboutUs' )
descriptions = div.find_all('div', class_='item-aboutUs')
print(descriptions)
Issue:
I want to ensure that I’m correctly extracting the text from the a tag inside the item-aboutUs div. Is there a better way to handle this, especially if there are multiple item-aboutUs divs?
You need to sign in to view this answers