OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Changing window size for scrapy spider

  • Thread starter Thread starter 阿聰MrOnion
  • Start date Start date

阿聰MrOnion

Guest
I am very new to web-scraping and I just watch that 4-hour video tutorial in YouTube.

Recently, I am trying to get data from jobsdb.com, and I always get empty value on some of the fields, what more strange is that I can actually see the data from the browser. After long time of trying, I realise it is because some of the data will be hidden if I make the window size too small, and maybe this is reason why my scrapy spider cannot get those data.

Website I am using now:https://hk.jobsdb.com/data-engineer-jobs

Code:
#I can get data of title but not the job_description 
Jobs = response.css('[data-automation="normalJob"]') 

Title = Jobs.css('[data-automation="jobTitle"]::text').get() 
Job_Desc = Jobs.css('[data-automation="jobShortDescription"]::text').get()

Please tell me if u need the whole code, the code run perfectly and just cannot get some of the data like the Job_Desc.

Since I am not really sure how scrapy get data, maybe just get the html so it ignore the windows size?

So I am thinking of some solution:

  1. Add some code to tell scrapy use a bigger window size
  2. Change the url (maybe add something like ?window-size="1920x1080")
  3. Add some JS using scrapy-splash

Here's all solution I can think of, but they seems not working...

<p>I am very new to web-scraping and I just watch that 4-hour video tutorial in YouTube.</p>
<p>Recently, I am trying to get data from jobsdb.com, and I always get empty value on some of the fields, what more strange is that I can actually see the data from the browser. After long time of trying, I realise it is because some of the data will be hidden if I make the window size too small, and maybe this is reason why my scrapy spider cannot get those data.</p>
<p>Website I am using now:<code>https://hk.jobsdb.com/data-engineer-jobs</code></p>
<pre><code>#I can get data of title but not the job_description
Jobs = response.css('[data-automation="normalJob"]')

Title = Jobs.css('[data-automation="jobTitle"]::text').get()
Job_Desc = Jobs.css('[data-automation="jobShortDescription"]::text').get()
</code></pre>
<p>Please tell me if u need the whole code, the code run perfectly and just cannot get some of the data like the <code>Job_Desc</code>.</p>
<p>Since I am not really sure how scrapy get data, maybe just get the html so it ignore the windows size?</p>
<p>So I am thinking of some solution:</p>
<ol>
<li>Add some code to tell scrapy use a bigger window size</li>
<li>Change the url (maybe add something like ?window-size="1920x1080")</li>
<li>Add some JS using scrapy-splash</li>
</ol>
<p>Here's all solution I can think of, but they seems not working...</p>
 

Latest posts

Top