Scraping Dynamic Websites with Selenium and Python

Learn how to scrape JavaScript-heavy websites using Selenium and Python. This tutorial covers setup, extracting dynamic content, and automating data c

Many websites load content dynamically using JavaScript, making it difficult to scrape data with traditional libraries like BeautifulSoup. In this tutorial, we will use Selenium to extract data from JavaScript-heavy websites.

Prerequisites

Ensure you have Python installed, along with the following libraries:

    pip install selenium webdriver-manager
	

Step 1: Setting Up Selenium

First, we need to configure Selenium with a web driver to automate browser interactions.

    from selenium import webdriver
    from selenium.webdriver.chrome.service import Service
    from webdriver_manager.chrome import ChromeDriverManager

    # Set up the driver
    service = Service(ChromeDriverManager().install())
    driver = webdriver.Chrome(service=service)
	

Step 2: Navigating a Dynamic Website

We can use Selenium to load a page and extract dynamically generated content.

    url = "https://example.com"
    driver.get(url)

    # Wait for elements to load
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC

    WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "dynamic-element"))
    )

    # Extract text
    element = driver.find_element(By.CLASS_NAME, "dynamic-element")
    print("Extracted Text:", element.text)
	

Step 3: Interacting with Web Elements

Selenium allows us to interact with buttons, input fields, and dropdowns.

    search_box = driver.find_element(By.NAME, "q")
    search_box.send_keys("Web Scraping with Selenium")
    search_box.submit()
	

Step 4: Extracting Data from Multiple Pages

We can scrape multiple pages by clicking the 'Next' button.

    while True:
        try:
            next_button = driver.find_element(By.LINK_TEXT, "Next")
            next_button.click()
            WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.CLASS_NAME, "dynamic-element"))
            )
        except:
            print("No more pages.")
            break
	

Step 5: Closing the Browser

After scraping, we should close the browser to free up resources.

    driver.quit()
	

Conclusion

In this tutorial, we used Selenium to scrape dynamic websites that rely on JavaScript. With Selenium, you can automate interactions, extract real-time data, and navigate complex pages.

Note: Some websites detect automation tools. To avoid detection, consider using headless mode, rotating user agents, or implementing delays between actions.

Post a Comment

© infoTequick. All rights reserved. Distributed by ASThemesWorld