OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

How to Bypass HTTP 403 Error When Scraping CoinGecko with Python?

  • Thread starter Thread starter HamidBee
  • Start date Start date
H

HamidBee

Guest
I am trying to scrape the Bitcoin markets section from CoinGecko using Python. However, I keep encountering a HTTP 403 error. I have tried using the requests library with custom headers to mimic a real browser, but I still get the same error.

Here is the code I am using:

Code:
import requests
import pandas as pd

# Base URL for Bitcoin markets on CoinGecko
base_url = "https://www.coingecko.com/en/coins/bitcoin"

# Function to fetch a single page
def fetch_page(url, page):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
        "X-Requested-With": "XMLHttpRequest"
    }
    response = requests.get(f"{url}?page={page}", headers=headers)
    if response.status_code != 200:
        print(f"Failed to fetch page {page}: Status code {response.status_code}")
        return None
    return response.text

# Function to extract market data from a page
def extract_markets(html):
    dfs = pd.read_html(html)
    return dfs[0] if dfs else pd.DataFrame()

# Main function to scrape all pages
def scrape_all_pages(base_url, max_pages=10):
    all_markets = []
    for page in range(1, max_pages + 1):
        print(f"Scraping page {page}...")
        html = fetch_page(base_url, page)
        if html is None:
            break
        df = extract_markets(html)
        if df.empty:
            break
        all_markets.append(df)

    return pd.concat(all_markets, ignore_index=True) if all_markets else pd.DataFrame()

# Scrape data and store in a DataFrame
max_pages = 10  # Adjust this to scrape more pages if needed
df = scrape_all_pages(base_url, max_pages)

# Display the DataFrame
print(df)

error:

Code:
Scraping page 1...
Failed to fetch page 1: Status code 403
Empty DataFrame
Columns: []
Index: []

I also tried a suggested solution on stackoverflow, but it did not resolve the issue.

Could someone suggest a workaround or a more effective way to scrape this data? Any help would be greatly appreciated. Thank you in advance.
<p>I am trying to scrape the Bitcoin markets section from CoinGecko using Python. However, I keep encountering a HTTP 403 error. I have tried using the requests library with custom headers to mimic a real browser, but I still get the same error.</p>
<p>Here is the code I am using:</p>
<pre><code>import requests
import pandas as pd

# Base URL for Bitcoin markets on CoinGecko
base_url = "https://www.coingecko.com/en/coins/bitcoin"

# Function to fetch a single page
def fetch_page(url, page):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
response = requests.get(f"{url}?page={page}", headers=headers)
if response.status_code != 200:
print(f"Failed to fetch page {page}: Status code {response.status_code}")
return None
return response.text

# Function to extract market data from a page
def extract_markets(html):
dfs = pd.read_html(html)
return dfs[0] if dfs else pd.DataFrame()

# Main function to scrape all pages
def scrape_all_pages(base_url, max_pages=10):
all_markets = []
for page in range(1, max_pages + 1):
print(f"Scraping page {page}...")
html = fetch_page(base_url, page)
if html is None:
break
df = extract_markets(html)
if df.empty:
break
all_markets.append(df)

return pd.concat(all_markets, ignore_index=True) if all_markets else pd.DataFrame()

# Scrape data and store in a DataFrame
max_pages = 10 # Adjust this to scrape more pages if needed
df = scrape_all_pages(base_url, max_pages)

# Display the DataFrame
print(df)
</code></pre>
<p>error:</p>
<pre><code>Scraping page 1...
Failed to fetch page 1: Status code 403
Empty DataFrame
Columns: []
Index: []
</code></pre>
<p>I also tried a suggested solution on stackoverflow, but it did not resolve the issue.</p>
<p>Could someone suggest a workaround or a more effective way to scrape this data? Any help would be greatly appreciated. Thank you in advance.</p>
 
Top