Trying to build a news scraper, but can't access the Wall Street Journal site

Trying to build a news scraper, but can’t access the Wall Street Journal site. I have got the subscription to the site as well as my CSRF Token, however, I still get denied access. I tried contacting the support team of WSJ, but have not got a response yet. Is there another way around this?

After running the code, it returns me "Login successful", but after that it raises an exception:"requests.exceptions.HTTPError: 403 Client Error: Forbidden for url:"

import requests
from bs4 import BeautifulSoup

login_url = "https://id.wsj.com/auth/login"
news_url = "https://www.wsj.com/"

with requests.Session() as session:
    login_page = session.get(login_url)
    login_page.raise_for_status()  
    payload = {
        'username': 
        'password': 
        'csrfToken': 
    }
    login_response = session.post(login_url, data=payload)
    login_response.raise_for_status()

    if login_response.ok:
        print("Login successful")
        response = session.get(news_url)
        response.raise_for_status()
        if response.ok:
            soup = BeautifulSoup(response.text, 'html.parser')
            headlines = soup.find_all('h3')
            for headline in headlines:
                print(headline.text)
    else:
        print("Login failed")

You need to sign in to view this answers

About Us

Categories

Android

C#

C++

CSS

GPL

HTML

Contact Info

Trying to build a news scraper, but can't access the Wall Street Journal site

Leave feedback about this Cancel Reply

PROS

CONS

Categories

Android

C#

C++

CSS

GPL

HTML

java

javascript

jQuery

Node.js

pdf

PHP

Recent Posts

Postgres drop type XX000 “cache lookup failed for type”

Login servlet app with session and cookies

About Us

Categories

Android

C#

C++

CSS

GPL

HTML

Contact Info

Follow Us

Trying to build a news scraper, but can't access the Wall Street Journal site

Share This Post:

Leave feedback about this Cancel Reply

PROS

CONS

Related Post

Android

C#

C++

CSS

GPL

HTML

java

javascript

jQuery

Node.js

pdf

PHP