OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Header for AWS Cloud 9 web scraping

  • Thread starter Thread starter Victor Resende
  • Start date Start date
V

Victor Resende

Guest
I created a script for web scraping in my personal desktop. So, I was migrating it to a Linux AWS Cloud 9 (my desktop is Windows), but it returns error 403. Is it because Headers configuration?

Code:
headers = {
    "authority": "www.reclameaqui.com.br",
    "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "accept-language": "pt-BR,pt;q=0.9",
    "cache-control": "max-age=0",
    "content-type": "text/html; charset=utf-8",
    "origin": "https://www.reclameaqui.com.br",
    "referer": "https://www.reclameaqui.com.br/",
    "sec-ch-ua": '"Not.A/Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
    "sec-ch-ua-mobile": "?0",
    "sec-ch-ua-platform": '"Windows"',
    "sec-fetch-dest": "document",
    "sec-fetch-mode": "navigate",
    "sec-fetch-site": "same-origin",
    "sec-fetch-user": "?1",
    "upgrade-insecure-requests": "1",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}

if response.status_code == 200:
   html_content = response.content
   soup = BeautifulSoup(html_content, 'html.parser')
else:
   print(response.status_code)
<p>I created a script for web scraping in my personal desktop. So, I was migrating it to a Linux AWS Cloud 9 (my desktop is Windows), but it returns error 403. Is it because Headers configuration?</p>
<pre><code>headers = {
"authority": "www.reclameaqui.com.br",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"accept-language": "pt-BR,pt;q=0.9",
"cache-control": "max-age=0",
"content-type": "text/html; charset=utf-8",
"origin": "https://www.reclameaqui.com.br",
"referer": "https://www.reclameaqui.com.br/",
"sec-ch-ua": '"Not.A/Brand";v="99", "Google Chrome";v="91", "Chromium";v="91"',
"sec-ch-ua-mobile": "?0",
"sec-ch-ua-platform": '"Windows"',
"sec-fetch-dest": "document",
"sec-fetch-mode": "navigate",
"sec-fetch-site": "same-origin",
"sec-fetch-user": "?1",
"upgrade-insecure-requests": "1",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}

if response.status_code == 200:
html_content = response.content
soup = BeautifulSoup(html_content, 'html.parser')
else:
print(response.status_code)
</code></pre>
 

Latest posts

Top