WhatsApp chat

Web Scraping con Python y Pyppeteer

Marlon Falcon Hernandez, pythonwebscraping
Back

El web scraping es una técnica utilizada mediante programas de software para extraer información de sitios web. Usualmente, estos programas simulan la navegación de un humano en la World Wide Web ya sea utilizando el protocolo HTTP manualmente, o incrustando un navegador en una aplicación.

Instalación de Pyppeteer

pip install pyppeteer

Ejemplo de Web Scraping

import asyncio
from pyppeteer import launch

async def main():
    # launch chromium browser in the background
    browser = await launch()
    # open a new tab in the browser
    page = await browser.newPage()
    # add URL to a new page and then open it
    await page.goto("https://www.python.org/")
    # create a screenshot of the page and save it
    await page.screenshot({"path": "python.png"})
    # close the browser
    await browser.close()

print("Starting...")
asyncio.get_event_loop().run_until_complete(main())
print("Screenshot has been taken")

Código completo para consultar telefonos de una cedula.

import asyncio
from pyppeteer import launch

async def main():
    cedulas = ["40221818848","00221818848","40221818699"]
    cont = 0
    for cedula in cedulas:
        cont = cont + 1
        print(cont)
        print("Cedula: " + cedula)
        url = "https://demophone.com/consulta-lineas";
        browser = await launch()
        page = await browser.newPage()
        print("Opening URL: " + url)
        await page.goto(url)
        inputCustomerRFC = await page.querySelector("#exampleInputEmail1")
        await inputCustomerRFC.type(cedula)
        await page.click(".card-body > .row > .col-xl-9 > div > .btn")
        try:
            element = await page.waitForSelector('.card-body > .d-flex > .d-flex > ul', timeout=1000)
            phones = await element.getProperty('innerText')
            phonejson = await phones.jsonValue()
            print(phonejson)
        except:
            print("No se encontraron telefonos")
        await browser.close()

print("Starting...")
asyncio.get_event_loop().run_until_complete(main())
print("Screenshot has been taken")


mfalconsoft@gmail.com / +34 (662) 47 0645RSS