Web Scraping NEWS Articles with Python

preview_player
Показать описание
How I go about web scraping new articles, in this case from Google news. The page is of course dynamically loaded but we can use requests_html to render the page for us and allow us access to the elements and their data. I run through a short example of how this works and point out some pitfalls along the way.

-------------------------------------
-------------------------------------
Disclaimer: These are affiliate links and as an Amazon Associate I earn from qualifying purchases

Рекомендации по теме
Комментарии
Автор

Very useful video John. Keep them coming. If you had made this video a day earlier it would have saved lots of my time. But for the future it's a good reference.

jimmysonerian
Автор

Awesome vid with easy to understand explanations! Thanks John. Would you ever consider adding a script that would then open each article and scrape the contents? That would be super useful to see!

parsairani
Автор

This was really helpful. Thanks a lot!!

anirudhnuti
Автор

Thank you for the great video. How can I scrape all the news from every page, not only page 1 of the web?

ma.t.t.
Автор

John, thanks for making everything so easily accessible. Going through this step by step has (a) worn out my pause/play finger but (b) allowed me to understand how you’ve been building this up. having followed through it there are a couple of questions which I have as a ‘first timer’. I’m using python from within Anaconda and VS Code but the ‘render’ isn’t turning blue and it’s telling me “newsarticle” is not defined…. Any suggestions? have to admit I’m on a MacBook Pro but everything else seems to be fine. Thanks John.

scg
Автор

Thank you. But I have this error "Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead."

chileendatos
Автор

I cant even start. " No module named " requests_html". Please, help me.

petkomarinov
Автор

Thanks for video, can I use the same code, to return only articles with specific name in the header ?

sanadmasoud
Автор

Great video sir. How can we modify this to save the results in a well-structured spreadsheet?

augastinendeti
Автор

Hey, i got some error when i run this, that is in render
AttributeError : coroutine object has no attribute newPage
runtimewarning coroutine launch was never awaited

utkarshtyagi
Автор

I am getting same number of articles when i am using scrolldown=0 or scrolldown=5
Can anyone explain, why?

shubhamsaxena
Автор

Great video Jhon !¿Can you tell me what does html render does technically to our program?

ismaelRR
Автор

thanks sir but i think for only top headline we can just use our bs4 and return the first h1 tag text

AshishBangwal
Автор

I am getting this RuntimeError: Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead.

manasimalbari
Автор

My list seems to stop at 100 articles? Is there a way to circumvent this?

jfqlkd
Автор

great video! would it be possible to scrape the whole contenet of the news? I am doing aproject about fake news detection and I would need the whole content :)

martinabozzi
Автор

where is content if i want to open each article and scrape content like title name how to do that?

km-coding
Автор

How can I get the content of the news rather than the link

WalterWhite-kvjt
Автор

did anyone else notice how my man wrote 'kink' real quick

aberema
Автор

I ended up getting duplicates in my list for reason. Each story title and link is listed at least 5 times each.

SunDevilThor