OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Scrapy splash does not load dynamic content

  • Thread starter Thread starter mhtuan
  • Start date Start date
M

mhtuan

Guest
I am using Splash with Scrapy to load dynamically rendered content in a page, but it does not work as I expected.

In setting.py I set these variables:

Code:
SPIDER_MIDDLEWARES = {
    'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
}
SPLASH_URL="http://localhost:8050"
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
SPLASH_COOKIES_DEBUG = False

The spider:

Code:
def start_requests(self):
        urls = [
            "https://callmeduy.com/san-pham/"
        ]
        for url in urls:
            yield SplashRequest(url=url, 
                                # endpoint='render.html', 
                                callback=self.parse, 
                                args={
                                    'wait': 5
                                })

def parse(self, response):
        print(response.xpath("//body").get())
        f = open('res.html', 'w+')
        f.write(response.xpath("//body").get())
        f.close()

The dynamic content has not been loaded. Here is the response body.

Please help if anybody knows.
<p>I am using Splash with Scrapy to load dynamically rendered content in a page, but it does not work as I expected.</p>
<p>In <code>setting.py</code> I set these variables:</p>
<pre><code>SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
}
SPLASH_URL="http://localhost:8050"
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'
SPLASH_COOKIES_DEBUG = False
</code></pre>
<p>The spider:</p>
<pre><code>def start_requests(self):
urls = [
"https://callmeduy.com/san-pham/"
]
for url in urls:
yield SplashRequest(url=url,
# endpoint='render.html',
callback=self.parse,
args={
'wait': 5
})

def parse(self, response):
print(response.xpath("//body").get())
f = open('res.html', 'w+')
f.write(response.xpath("//body").get())
f.close()
</code></pre>
<p>The dynamic content has not been loaded. Here is the
<a href="https://i.sstatic.net/CbP9ICgr.png" rel="nofollow noreferrer">response body</a>.</p>
<p>Please help if anybody knows.</p>
 

Latest posts

I
Replies
0
Views
1
Isaac P. Liu
I
U
Replies
0
Views
1
user3658366
U
G
Replies
0
Views
1
Giampaolo Levorato
G
M
Replies
0
Views
1
Marcelo Rodrigo Nascimento
M
Top