Introduction
Web scraping can feel like navigating a minefield when servers block your requests with 403 Forbidden errors. These errors often occur because websites detect non-browser traffic (like scripts) through mechanisms like TLS fingerprinting, header validation, or IP blocking. While tools like Selenium mimic browsers, they’re resource-heavy. In this guide, I’ll share multiple proven techniques to bypass 403 errors using Python, including a hidden gem: curl_cffi.
The Problem: 403 Forbidden Hell
While trying to scrap some data from a website, my Python script using the popular requests library kept hitting a brick wall:
import requests response = requests.get(url, headers=perfect_headers)# Always returns 403!
Despite:
- Perfectly replicated headers (via MITMproxy)
- Matching cookies
- Correct user-agent
- Proper TLS configuration
The server kept rejecting my requests with 403 Forbidden errors. Why?
Why 403 Errors Happen
Missing/Invalid Headers: Servers check for browser-like headers (e.g., sec-ch-ua, user-agent).TLS/JA3 Fingerprinting: Servers detect non-browser TLS handshakes.IP Rate Limiting: Too many requests from the same IP.Path/Protocol Validation: URLs or HTTP versions may trigger suspicion.
in my case it was The Culprit: TLS Fingerprinting
Modern websites don’t just check headers — they analyze your TLS handshake fingerprint (JA3). Libraries like requests and urllib have distinct fingerprints that scream "BOT!" to servers.
The Solution
Use curl_cffi to Impersonate Browser TLS Fingerprints, The curl_cffi library mimics browser TLS fingerprints, bypassing JA3 detection. it combines cURL’s power with browser-like TLS fingerprints.
1. Installation
pip install curl_cffi
2. The Magic Code
# Install: pip install curl_cffi from curl_cffi import requests headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36...", "accept": "*/*","referer": "https://example.com"} response = requests.get("https://example.com/",headers=headers,impersonate="chrome110" # Mimics Chrome 110 TLS)
Impersonation Targets:
# Available options
- impersonate="chrome110"
- impersonate="chrome120"
- impersonate="safari16"
Key Differentiators
- impersonate parameter specifying Chrome 110
- No SSL verification needed
- Automatic handling of HTTP/2 and brotli encoding
Why This Works
- Spoofs Chrome’s TLS fingerprint, making the request appear browser-like.
- Avoids the need for Selenium or headless browsers.
Tips:
- Add random delays between requests
- Rotate user-agent strings
- Use proxy rotation
Other Solutions to try
1. Refine Headers to Match Browser Requests
Capture headers from a real browser (using Chrome DevTools or mitmproxy) and include all critical headers like: sec-ch-ua, sec-fetch-*, referer, origin
Example:
headers = {"sec-ch-ua": '"Google Chrome";v="131", "Chromium";v="131", "Not-A Brand";v="24"',"sec-ch-ua-mobile": "?0", "sec-ch-ua-platform": "Windows","sec-fetch-site": "same-origin","sec-fetch-mode": "cors","referer": "https://example.com/","priority": "u=1, i"}
Tip: Simplify headers if they conflict (e.g., use accept: */* instead of complex values).
2. Use Sessions and Rotate User-Agents
Persist cookies and rotate headers with requests.Session:
import requests from fake_useragent import UserAgent session = requests.Session() ua = UserAgent() headers = {"user-agent": ua.chrome,"accept-language": "en-US,en;q=0.9" session.headers.update(headers) response = session.get("https://example.com/")
3. Spoof HTTP/2 with httpx
Some sites require HTTP/2 support. Use httpx for HTTP/2 compatibility:
# Install: pip install httpx import httpx with httpx.Client(http2=True, headers=headers) as client: response = client.get("https://example.com/")
4. Bypass Path Validation
Modify the URL to trick path-based filters:
url = "https://example.com// # Add trailing slashes # OR url = "https://example.com/?cache=1" # Add dummy params
5. Route Through Proxies
Rotate IPs to avoid blocks:
proxies = {"http": "http://user:pass@proxy_ip:port", "https": "http://user:pass@proxy_ip:port"} response = requests.get(url, headers=headers, proxies=proxies)
Free Proxies: Use services like FreeProxyList, but expect instability.
6. Disable SSL Verification (Last Resort)
If the site blocks non-browser SSL handshakes:
response = requests.get(url, headers=headers, verify=False) # Use with caution!
Conclusion
Bypassing 403 errors requires mimicking browsers at multiple levels: headers, TLS fingerprints, and request patterns. While curl_cffi is a game-changer, combining it with header refinement, HTTP/2, and proxies ensures robust scraping. Always respect robots.txt and avoid overloading servers.
Got your own 403 horror story? Share your experiences in the comments!
⚠️ Disclaimer: This article is for educational purposes only. Always obtain proper authorization before scraping any website.