Web scraping is a powerful technique for extracting data from websites. In this tutorial, we will build a stock market data scraper using Python, BeautifulSoup, and requests. This will allow us to collect and analyze stock prices from a financial website.
Prerequisites
Before we start, make sure you have Python installed along with the following libraries:
pip install requests beautifulsoup4 pandas
Step 1: Fetching Stock Market Data
First, we need to send an HTTP request to the financial website and extract the HTML content.
import requests
from bs4 import BeautifulSoup
url = "https://finance.yahoo.com/quote/AAPL?p=AAPL" # Replace with the stock page you want
headers = {"User-Agent": "Mozilla/5.0"}
response = requests.get(url, headers=headers)
if response.status_code == 200:
soup = BeautifulSoup(response.text, "html.parser")
print(soup.prettify()) # Print the formatted HTML
else:
print("Failed to retrieve data")
Step 2: Extracting Stock Price
Now that we have the HTML content, let's extract the stock price using the correct CSS selector.
stock_price = soup.find("fin-streamer", {"data-field": "regularMarketPrice"})
if stock_price:
print("Stock Price:", stock_price.text)
else:
print("Stock price not found")
Step 3: Storing the Data in a CSV File
We can store the extracted data in a CSV file using pandas for future analysis.
import pandas as pd
from datetime import datetime
data = {"Time": [datetime.now().strftime("%Y-%m-%d %H:%M:%S")], "Stock Price": [stock_price.text]}
df = pd.DataFrame(data)
df.to_csv("stock_prices.csv", mode="a", header=not pd.read_csv("stock_prices.csv").empty, index=False)
print("Data saved to stock_prices.csv")
Step 4: Automating the Scraper
To keep track of stock prices over time, we can schedule the script using the schedule module.
import schedule
import time
def scrape_stock():
response = requests.get(url, headers=headers)
if response.status_code == 200:
soup = BeautifulSoup(response.text, "html.parser")
stock_price = soup.find("fin-streamer", {"data-field": "regularMarketPrice"})
if stock_price:
print("Stock Price:", stock_price.text)
data = {"Time": [datetime.now().strftime("%Y-%m-%d %H:%M:%S")], "Stock Price": [stock_price.text]}
df = pd.DataFrame(data)
df.to_csv("stock_prices.csv", mode="a", header=False, index=False)
else:
print("Stock price not found")
schedule.every(10).minutes.do(scrape_stock)
while True:
schedule.run_pending()
time.sleep(1)
Conclusion
In this guide, we built a simple web scraper to fetch stock market data, store it in a CSV file, and automate the process using scheduling. This can be extended further to analyze stock trends, visualize data, or integrate with trading algorithms.
Note: Some financial websites use anti-scraping techniques. If you encounter issues, consider using APIs like Alpha Vantage or Yahoo Finance API instead of scraping.