Many websites load content dynamically using JavaScript, making it difficult to scrape data with traditional libraries like BeautifulSoup. In this tutorial, we will use Selenium to extract data from JavaScript-heavy websites.
Prerequisites
Ensure you have Python installed, along with the following libraries:
pip install selenium webdriver-manager
Step 1: Setting Up Selenium
First, we need to configure Selenium with a web driver to automate browser interactions.
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
# Set up the driver
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)
Step 2: Navigating a Dynamic Website
We can use Selenium to load a page and extract dynamically generated content.
url = "https://example.com"
driver.get(url)
# Wait for elements to load
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "dynamic-element"))
)
# Extract text
element = driver.find_element(By.CLASS_NAME, "dynamic-element")
print("Extracted Text:", element.text)
Step 3: Interacting with Web Elements
Selenium allows us to interact with buttons, input fields, and dropdowns.
search_box = driver.find_element(By.NAME, "q")
search_box.send_keys("Web Scraping with Selenium")
search_box.submit()
Step 4: Extracting Data from Multiple Pages
We can scrape multiple pages by clicking the 'Next' button.
while True:
try:
next_button = driver.find_element(By.LINK_TEXT, "Next")
next_button.click()
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "dynamic-element"))
)
except:
print("No more pages.")
break
Step 5: Closing the Browser
After scraping, we should close the browser to free up resources.
driver.quit()
Conclusion
In this tutorial, we used Selenium to scrape dynamic websites that rely on JavaScript. With Selenium, you can automate interactions, extract real-time data, and navigate complex pages.
Note: Some websites detect automation tools. To avoid detection, consider using headless mode, rotating user agents, or implementing delays between actions.