I’m building a Scrapy-based crawler and facing Cloudflare protection on some sites.
Here’s my current setup:
I have a separate API service that can bypass Cloudflare by simulating a real browser (e.g., using Playwright or similar).
This API returns a JSON object containing headers (like
User-Agent,Referer) and cookies (likecf_clearance), for example:{ "url": "https://example.com/feed/", "headers": { "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0", "Referer": "https://example.com/feed/?__cf_chl_tk=..." }, "cookies": [ {"name": "cf_clearance", "value": "...", "domain": ".example.com", "path": "/"} ], "bypassed": true }Inside my Scrapy downloader middleware, if a request returns a
403, I call this API and get the above data.Then I retry the same URL in Scrapy with the returned headers and cookies.
if response.status == 403: resp = requests.post("http://localhost:8001/api/v1/crawler/fetch/", data={"url": request.url}) api_data = resp.json() if api_data.get("bypassed"): new_request = request.replace( headers=api_data["headers"], cookies={"cf_clearance": api_data["cookies"][0]["value"]}, dont_filter=True ) return new_request
My question:
If I already have the correct User-Agent, Referer, and cf_clearance cookie from a previous bypass,
can I reliably continue to bypass Cloudflare challenges using only Scrapy requests (without rendering or executing JS)?
Or does Cloudflare revalidate sessions using other browser-based checks (like JS or TLS fingerprints) that can’t be replicated just by sending headers and cookies?