OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

SSL Certificate Verification Error When Scraping Website and Inserting Data into MongoDB

  • Thread starter Thread starter Boddula Rishil
  • Start date Start date
B

Boddula Rishil

Guest

Problem Description:​


I'm attempting to scrape the website at https://www.cbit.ac.in/current_students/acedamic-calendar/ using the requests library along with BeautifulSoup. However, upon making a request to the website, I encounter the following SSL certificate verification error:

Code:
requests.exceptions.SSLError:
  HTTPSConnectionPool(host='www.cbit.ac.in', port=443):
    Max retries exceeded with url:
      /current_students/acedamic-calendar/
      (Caused by SSLError(SSLCertVerificationError(1,
        '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))

Approach:​


To address the SSL verification issue, I've attempted to specify the path to the CA certificate using the verify parameter in the requests.get() function call. The CA certificate path is /Users/rishilboddula/Downloads/cbit.ac.in.cer. Despite this, the SSL verification error persists.

After successfully scraping the website, I intend to store the extracted URLs in a MongoDB collection named ull using the pymongo library. However, due to the SSL verification error, I'm unable to proceed with the scraping and data insertion process.

Request for Assistance:​


I'm seeking guidance on resolving the SSL certificate verification error to successfully scrape the website and insert the data into MongoDB. Additionally, if there are any best practices or alternative approaches for handling SSL certificate verification in Python, I would greatly appreciate any insights.

Code:
# Import necessary libraries
import requests
from bs4 import BeautifulSoup
import pymongo

# Specify the path to the CA certificate
ca_cert_path = '/Users/rishilboddula/Downloads/cbit.ac.in.cer'

# Make a request to the website with SSL verification
req = requests.get('https://www.cbit.ac.in/current_students/acedamic-calendar/', verify=ca_cert_path)

# Parse the HTML content
soup = BeautifulSoup(req.content, 'html.parser')

# Extract all URLs from the webpage
links = soup.find_all('a')
urls = [link.get('href') for link in links]

# Connect to MongoDB
client = pymongo.MongoClient('mongodb://localhost:27017')
db = client["data"]
ull = db["ull"]

# Insert each URL into the MongoDB collection
for url in urls:
    ull.insert_one({"url": url})
<h4>Problem Description:</h4>
<p>I'm attempting to scrape the website at <a href="https://www.cbit.ac.in/current_students/acedamic-calendar/" rel="nofollow noreferrer">https://www.cbit.ac.in/current_students/acedamic-calendar/</a> using the <code>requests</code> library along with <code>BeautifulSoup</code>. However, upon making a request to the website, I encounter the following SSL certificate verification error:</p>
<pre><code>requests.exceptions.SSLError:
HTTPSConnectionPool(host='www.cbit.ac.in', port=443):
Max retries exceeded with url:
/current_students/acedamic-calendar/
(Caused by SSLError(SSLCertVerificationError(1,
'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1123)')))
</code></pre>
<h4>Approach:</h4>
<p>To address the SSL verification issue, I've attempted to specify the path to the CA certificate using the verify parameter in the <code>requests.get()</code> function call. The CA certificate path is <code>/Users/rishilboddula/Downloads/cbit.ac.in.cer</code>. Despite this, the SSL verification error persists.</p>
<p>After successfully scraping the website, I intend to store the extracted URLs in a MongoDB collection named <code>ull</code> using the <code>pymongo</code> library. However, due to the SSL verification error, I'm unable to proceed with the scraping and data insertion process.</p>
<h4>Request for Assistance:</h4>
<p>I'm seeking guidance on resolving the SSL certificate verification error to successfully scrape the website and insert the data into MongoDB. Additionally, if there are any best practices or alternative approaches for handling SSL certificate verification in Python, I would greatly appreciate any insights.</p>
<pre class="lang-python prettyprint-override"><code># Import necessary libraries
import requests
from bs4 import BeautifulSoup
import pymongo

# Specify the path to the CA certificate
ca_cert_path = '/Users/rishilboddula/Downloads/cbit.ac.in.cer'

# Make a request to the website with SSL verification
req = requests.get('https://www.cbit.ac.in/current_students/acedamic-calendar/', verify=ca_cert_path)

# Parse the HTML content
soup = BeautifulSoup(req.content, 'html.parser')

# Extract all URLs from the webpage
links = soup.find_all('a')
urls = [link.get('href') for link in links]

# Connect to MongoDB
client = pymongo.MongoClient('mongodb://localhost:27017')
db = client["data"]
ull = db["ull"]

# Insert each URL into the MongoDB collection
for url in urls:
ull.insert_one({"url": url})
</code></pre>
 
Top