October 22, 2024
Chicago 12, Melborne City, USA
python

LookupError: unknown encoding: 'b'utf8''


I don’t know why, but I am getting a lookup error with an unknown encoding found, ‘b’utf8” when I try to scrape and parse Walmart’s web page.

I have already set the encoding to utf-8 and also tried removing BOM, according to this post: lxml LookupError occured. Arguments: ("unknown encoding: 'b'utf-8-sig''",).

Appreciate any help or pointers!

Complete code:

import httpx
from parsel import Selector
import json

# Fake browser-like headers
BASE_HEADERS = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    "accept-language": "en-US;en;q=0.9",
    "accept-encoding": "gzip, deflate, br",
}

response = httpx.get("https://www.walmart.com/product-page-url", headers=BASE_HEADERS)
if response.encoding is None:
    response.encoding = 'utf-8' 

# Remove BOM if present
content = response.content
if content.startswith(b'\xef\xbb\xbf'):
    content = content[3:]  # Remove the BOM

response_text = content.decode('utf-8')
sel = Selector(text=response_text)
data = sel.xpath('//script[@id="__NEXT_DATA__"]/text()').get()

if data:
    data = json.loads(data)
    product = data["props"]["pageProps"]["initialData"]["data"]["product"]
    print(product)
else:
    print("No product data found.")



You need to sign in to view this answers

Leave feedback about this

  • Quality
  • Price
  • Service

PROS

+
Add Field

CONS

+
Add Field
Choose Image
Choose Video