OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Scrapy scraping web page giving me blank value for one value

  • Thread starter Thread starter undecided000
  • Start date Start date
U

undecided000

Guest
I am trying to scrape this web page with scrapy and I can get all the data I am needing besides the distance. The link https://www.thedogs.com.au/racing/albion-park/2024-05-30/10/tab-flying-amy-classic-h?trial=false

The distance is 520m. How do I get it to scrape this value? Please see the bold code below.

Code:
rules = (
        Rule(LinkExtractor(restrict_xpaths="//td[@class='meetings-venues__race-time']/a"), callback='parse_item', follow=True),
    )

def parse_item(self, response):
    item = {}

    hxs = Selector(response)
    divs = hxs.xpath('//tr[@class="accordion__anchor race-runner"]')   
            
           # titles = [hxs.select('//tr[@class="index class_tr group-6487"] | //tr[@class="index class_tr group-6488"] | //tr[@class="index class_tr group-6489"]')]

    for div in divs:
        item = {
            'grade' : div.xpath(".//td[@class='race-runners__grade']/text()").extract(),
            'greyhound' : div.xpath('./td[3]/div[1]/a/text()').extract(),
            'position' : div.xpath('./td[1]/text()').extract(),
            'trainer' : div.xpath(".//div[@class='race-runners__name__trainer']/a/text()").extract(),
            'weight' : div.xpath(".//td[@class='race-runners__weight']/text()").extract(),
            'first_sec' : div.xpath(".//td[@class='race-runners__sectional']/text()").extract_first(),
            'second_sec' : div.xpath(".//td[@class='race-runners__sectional'][2]/text()").extract(),
            'time' : div.xpath(".//td[@class='race-runners__time']/text()").extract(),
            'margin' : div.xpath(".//td[@class='race-runners__margin']/text()").extract(),
            ***'distance' : div.xpath(".//div[@class='race-header__info__grade']/a/text()").extract(),***
            'starting_price' : div.xpath(".//td[@class='race-runners__starting-price']/text()").extract(),
            'date' : response.url.split('/')[-3],
            'track' : response.url.split('/')[-4],
            'rug' : div.xpath('.//td[@class="table__cell--tight race-runners__box"]/sprite-svg/@name').get()
            #'rug' : div.xpath('//td[@class="table__cell--tight race-runners__box"]/sprite-svg/@name').extract()
            }

        yield item
<p>I am trying to scrape this web page with scrapy and I can get all the data I am needing besides the distance. The link <a href="https://www.thedogs.com.au/racing/albion-park/2024-05-30/10/tab-flying-amy-classic-h?trial=false" rel="nofollow noreferrer">https://www.thedogs.com.au/racing/albion-park/2024-05-30/10/tab-flying-amy-classic-h?trial=false</a></p>
<p>The distance is 520m. How do I get it to scrape this value? Please see the bold code below.</p>
<pre><code>rules = (
Rule(LinkExtractor(restrict_xpaths="//td[@class='meetings-venues__race-time']/a"), callback='parse_item', follow=True),
)

def parse_item(self, response):
item = {}

hxs = Selector(response)
divs = hxs.xpath('//tr[@class="accordion__anchor race-runner"]')

# titles = [hxs.select('//tr[@class="index class_tr group-6487"] | //tr[@class="index class_tr group-6488"] | //tr[@class="index class_tr group-6489"]')]

for div in divs:
item = {
'grade' : div.xpath(".//td[@class='race-runners__grade']/text()").extract(),
'greyhound' : div.xpath('./td[3]/div[1]/a/text()').extract(),
'position' : div.xpath('./td[1]/text()').extract(),
'trainer' : div.xpath(".//div[@class='race-runners__name__trainer']/a/text()").extract(),
'weight' : div.xpath(".//td[@class='race-runners__weight']/text()").extract(),
'first_sec' : div.xpath(".//td[@class='race-runners__sectional']/text()").extract_first(),
'second_sec' : div.xpath(".//td[@class='race-runners__sectional'][2]/text()").extract(),
'time' : div.xpath(".//td[@class='race-runners__time']/text()").extract(),
'margin' : div.xpath(".//td[@class='race-runners__margin']/text()").extract(),
***'distance' : div.xpath(".//div[@class='race-header__info__grade']/a/text()").extract(),***
'starting_price' : div.xpath(".//td[@class='race-runners__starting-price']/text()").extract(),
'date' : response.url.split('/')[-3],
'track' : response.url.split('/')[-4],
'rug' : div.xpath('.//td[@class="table__cell--tight race-runners__box"]/sprite-svg/@name').get()
#'rug' : div.xpath('//td[@class="table__cell--tight race-runners__box"]/sprite-svg/@name').extract()
}

yield item
</code></pre>
Continue reading...
 

Latest posts

Top