Trying to build a news scraper, but can’t access the Wall Street Journal site. I have got the subscription to the site as well as my CSRF Token, however, I still get denied access. I tried contacting the support team of WSJ, but have not got a response yet. Is there another way around this?
After running the code, it returns me "Login successful", but after that it raises an exception:"requests.exceptions.HTTPError: 403 Client Error: Forbidden for url:"
import requests
from bs4 import BeautifulSoup
login_url = "https://id.wsj.com/auth/login"
news_url = "https://www.wsj.com/"
with requests.Session() as session:
login_page = session.get(login_url)
login_page.raise_for_status()
payload = {
'username':
'password':
'csrfToken':
}
login_response = session.post(login_url, data=payload)
login_response.raise_for_status()
if login_response.ok:
print("Login successful")
response = session.get(news_url)
response.raise_for_status()
if response.ok:
soup = BeautifulSoup(response.text, 'html.parser')
headlines = soup.find_all('h3')
for headline in headlines:
print(headline.text)
else:
print("Login failed")
You need to sign in to view this answers