OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

How to share DOWNLOAD_DELAY in Scrapyd?

  • Thread starter Thread starter HairlessVillager
  • Start date Start date
H

HairlessVillager

Guest
I have a service that randomly sends crawling requests for the same at different times, and the time interval between two crawling requests may be less than DOWNLOAD_DELAY. I want to use Scrapyd to handle crawling requests, and the time interval to send HTTP requests should not be less than DOWNLOAD_DELAY. This may require different scrapyd jobs to share the same DOWNLOAD_DELAY. How to implement that?

I conducted two experiments, the first time only sending one crawl request, and the second time sending two consecutive crawl requests. The way to send a crawl request is through the following command:

Code:
curl http://localhost:6800/schedule.json -d project=quotesbot -d spider=toscrape-xpath -d setting=DOWNLOAD_DELAY=5 -d setting=RANDOMIZE_DOWNLOAD_DELAY=False

and all the three elapsed_time_seconds in logs are the same (about 50s).

I've searched on issues, only found #221 met the same question without solution in 2017.
<p>I have a service that randomly sends crawling requests for the same at different times, and the time interval between two crawling requests may be less than <a href="https://docs.scrapy.org/en/latest/topics/settings.html#download-delay" rel="nofollow noreferrer">DOWNLOAD_DELAY</a>. I want to use <a href="https://scrapyd.readthedocs.io/en/stable/overview.html" rel="nofollow noreferrer">Scrapyd</a> to handle crawling requests, and the time interval to send HTTP requests should not be less than DOWNLOAD_DELAY. This may require different scrapyd jobs to share the same DOWNLOAD_DELAY. How to implement that?</p>
<p>I conducted two experiments, the first time only sending one crawl request, and the second time sending two consecutive crawl requests. The way to send a crawl request is through the following command:</p>
<pre><code>curl http://localhost:6800/schedule.json -d project=quotesbot -d spider=toscrape-xpath -d setting=DOWNLOAD_DELAY=5 -d setting=RANDOMIZE_DOWNLOAD_DELAY=False
</code></pre>
<p>and all the three <code>elapsed_time_seconds</code> in logs are the same (about 50s).</p>
<p>I've searched on issues, only found <a href="https://github.com/scrapy/scrapyd/issues/221" rel="nofollow noreferrer">#221</a> met the same question without solution in 2017.</p>
 

Latest posts

Top