I’m building a Scrapy-based crawler and facing Cloudflare protection on some sites.

Here’s my current setup:

  • I have a separate API service that can bypass Cloudflare by simulating a real browser (e.g., using Playwright or similar).

  • This API returns a JSON object containing headers (like User-Agent, Referer) and cookies (like cf_clearance), for example:

    {
      "url": "https://example.com/feed/",
      "headers": {
        "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:125.0) Gecko/20100101 Firefox/125.0",
        "Referer": "https://example.com/feed/?__cf_chl_tk=..."
      },
      "cookies": [
        {"name": "cf_clearance", "value": "...", "domain": ".example.com", "path": "/"}
      ],
      "bypassed": true
    }
    
  • Inside my Scrapy downloader middleware, if a request returns a 403, I call this API and get the above data.

  • Then I retry the same URL in Scrapy with the returned headers and cookies.

    if response.status == 403:
        resp = requests.post("http://localhost:8001/api/v1/crawler/fetch/", data={"url": request.url})
        api_data = resp.json()
    
        if api_data.get("bypassed"):
            new_request = request.replace(
                headers=api_data["headers"],
                cookies={"cf_clearance": api_data["cookies"][0]["value"]},
                dont_filter=True
            )
            return new_request
    

My question:
If I already have the correct User-Agent, Referer, and cf_clearance cookie from a previous bypass,
can I reliably continue to bypass Cloudflare challenges using only Scrapy requests (without rendering or executing JS)?

Or does Cloudflare revalidate sessions using other browser-based checks (like JS or TLS fingerprints) that can’t be replicated just by sending headers and cookies?

0

Your Reply

By clicking “Post Your Reply”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.