import pandas as pd
import requests
from bs4 import BeautifulSoup
Week Three Highlights
We had a long weekend so Monday was a change than usual, and I got to sleep in!
Tuesday and Wednesday- I was out of office due to health reasons
Thursday- Worked on collecting data from Iowa Grocers excel file for prices on eggs, bacon and heirloom tomatoes. Finished cities in Iowa starting from the letter C and finished that. Proceeded to work on cities from letter I. Completed cities from O to R and W as well.
Finished Datacamp Web Scraping Course.
Friday- Started working on developing a webscraping script in Python using BeautifulSoup, requests and pandas.
# Read the input Excel file
= "grocery_websites.xlsx"
input_file = pd.read_excel(input_file) df
# Create a new DataFrame to store the results
= pd.DataFrame(columns=["Website", "Product", "Price"]) result_df
# Scrape prices for each website
for index, row in df.iterrows():
= row["Website"]
website = row["Product"]
product = row["URL"]
url
try:
# Send a GET request to the website
= requests.get(url)
response = BeautifulSoup(response.content, "html.parser")
soup
# Find the price element on the page
= soup.find("span", class_="price-amount")
price_element
if price_element:
= price_element.text.strip()
price = result_df.append({"Website": website, "Product": product, "Price": price}, ignore_index=True)
result_df else:
= result_df.append({"Website": website, "Product": product, "Price": "Not found"}, ignore_index=True)
result_df
except requests.exceptions.RequestException as e:
print(f"Error scraping {website}: {e}")
= result_df.append({"Website": website, "Product": product, "Price": "Error"}, ignore_index=True) result_df
# Write the results to a new Excel file
= "grocery_prices.xlsx"
output_file =False) result_df.to_excel(output_file, index
This is still in its working phase so hasn’t been finished yet, but working on it so that web scraping gets as autonomous as possible.