OiO.lk Community platform!

Oio.lk is an excellent forum for developers, providing a wide range of resources, discussions, and support for those in the developer community. Join oio.lk today to connect with like-minded professionals, share insights, and stay updated on the latest trends and technologies in the development field.
  You need to log in or register to access the solved answers to this problem.
  • You have reached the maximum number of guest views allowed
  • Please register below to remove this limitation

Web scrapping IMDB movies

  • Thread starter Thread starter gonza
  • Start date Start date
G

gonza

Guest
i have a problem with my code. So i'm trying to web scrapp the 250 top movies in imdb. Fron this url - > https://www.imdb.com/chart/top/

The problem is that i can only extract 25 movies and i want the 250. This is my code.

Code:
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import time
import re
from requests.exceptions import HTTPError
from urllib.request import urlopen
contenido = None
encabezados = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36 Edge/101.0.1210.53",
    'Accept-Language': 'en-us,en;q=0.5'
}
def rastrear_sitio_web(url: str, headers: str) -> str:
    try:
        respuesta = requests.get(url, headers=headers)
        respuesta.raise_for_status()
    except HTTPError as exc:
        print(exc)
    else:
        return respuesta.text

URL = 'https://www.imdb.com/chart/top/'
contenido = rastrear_sitio_web(url=URL, headers=encabezados)
pagina = BeautifulSoup(contenido, 'html.parser')
contenido_extraido = []
año = [""]
ranking = [""]
titulo = [""]
nota = [""]
tiempo = [""]
rated = [""]

tabla = pagina.find('div', {'data-testid': 'chart-layout-main-column'})

peliculas = tabla.find("ul")

for pelicula in peliculas.find_all('li'):
    pelicula = pelicula.get_text(";").strip().split(";")
    año.append(pelicula[1])
    ranking.append(pelicula[0].split(".")[0])
    titulo.append(pelicula[0].split(".")[1])
    nota.append(pelicula[4])
    tiempo.append(pelicula[2])
    rated.append(pelicula[3])

año.pop(0)

ranking.pop(0)


titulo.pop(0)


nota.pop(0)


tiempo.pop(0)

rated.pop(0)

datos = {'Ranking': ranking, 'Título': titulo, 'Año': año, 'Calificación': nota, 'Duracion':tiempo, 'Rated': rated}
print(datos)
contenido_extraido = pd.DataFrame(data=datos)

I tried changing functions and changing the classes in the html code but it doesn't work, also i tried differente codes but they have the same problem.
<p>i have a problem with my code. So i'm trying to web scrapp the 250 top movies in imdb. Fron this url - > <a href="https://www.imdb.com/chart/top/" rel="nofollow noreferrer">https://www.imdb.com/chart/top/</a></p>
<p>The problem is that i can only extract 25 movies and i want the 250. This is my code.</p>
<pre><code>from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import time
import re
from requests.exceptions import HTTPError
from urllib.request import urlopen
contenido = None
encabezados = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36 Edge/101.0.1210.53",
'Accept-Language': 'en-us,en;q=0.5'
}
def rastrear_sitio_web(url: str, headers: str) -> str:
try:
respuesta = requests.get(url, headers=headers)
respuesta.raise_for_status()
except HTTPError as exc:
print(exc)
else:
return respuesta.text

URL = 'https://www.imdb.com/chart/top/'
contenido = rastrear_sitio_web(url=URL, headers=encabezados)
pagina = BeautifulSoup(contenido, 'html.parser')
contenido_extraido = []
año = [""]
ranking = [""]
titulo = [""]
nota = [""]
tiempo = [""]
rated = [""]

tabla = pagina.find('div', {'data-testid': 'chart-layout-main-column'})

peliculas = tabla.find("ul")

for pelicula in peliculas.find_all('li'):
pelicula = pelicula.get_text(";").strip().split(";")
año.append(pelicula[1])
ranking.append(pelicula[0].split(".")[0])
titulo.append(pelicula[0].split(".")[1])
nota.append(pelicula[4])
tiempo.append(pelicula[2])
rated.append(pelicula[3])

año.pop(0)

ranking.pop(0)


titulo.pop(0)


nota.pop(0)


tiempo.pop(0)

rated.pop(0)

datos = {'Ranking': ranking, 'Título': titulo, 'Año': año, 'Calificación': nota, 'Duracion':tiempo, 'Rated': rated}
print(datos)
contenido_extraido = pd.DataFrame(data=datos)
</code></pre>
<p>I tried changing functions and changing the classes in the html code but it doesn't work, also i tried differente codes but they have the same problem.</p>
 
Top