Requests-HTML - Checking out a new HTML parsing library for Python

preview_player
Показать описание
Checking out a new HTML-parsing library by the author of Requests:

Рекомендации по теме
Комментарии
Автор

I like this type of video. You should do like a monthly video of new module so people can be aware. This will be very useful people that learn python.

thehungman
Автор

man, i'm mixing that HTML parsing sauce with my beautiful soup right now

rumidom
Автор

I think the // for retry in range(100): // part is what is allowing the script to continue after raising the error. From their doc: "The simplest use case is retrying a flaky function whenever an Exception occurs until a value is returned." So this would allow the exception to be printed, yet the script to continue I believe. Great content man, thanks for all of the awesome videos :).

cooperlimond
Автор


So basically they just have a list of symbols like 'more', 'next' or 'older' and look for their hrefs.

So on HN page 2, the title from the CNBC story has the word 'more' in it. Haha.

However, there are statistical method about how you can find out how a page uses pagination but I guess that's a bigger nut to crack for such a young library :)

yokoono
Автор

Multiple classes in html is shown by spaces.
So in CSS selectors, it will be separated by a '.'(dot)
For example:
<div class="foo bar"> will be referenced as about.find("div.foo.bar")
Also, '(' and ')' are invalid css selector characters so you have to escape them.

mshirazab
Автор

Best way to remove error
-> comment out raise statement. 😁😂

rohnchatterjee
Автор

This is cool! And it gave me the idea of a series of videos about how to create a python package

Lucas-wlpy
Автор

Hello!
Recently I am trying to parse some webpages with Requests-html asynchronously. Theoretically this can be done by working with AsyncHTMLSession. However, I am unable to get result with it most of the time (I also use arender, the attempts to parse the webpages fails due to different reasons - most probably timeouts). Maybe it's just the poor internet connection, but I'd be really grateful if you uploaded a video or help me with this.

anyad
Автор

Thank u sentdex you are leading me to the real world from africa

LolLol-wyfp
Автор

Thank you so much!, I got the answer based on your guide. *nice helmet ⛑ you got back there*

shazkingdom
Автор

The function couldn't clean up user data because these files were locked by chromium process.

MohamedMagdyHammad
Автор

18:52 I ‘think’ replacing the spaces with periods should fix the period error, no idea about the parentheses, though. Backslash or HTML escape?

Hans-jcju
Автор

Sentdex! Can you show scraping from a page with a "show more" button, that loads more of the page in JavaScript ?

SimOn-bzxy
Автор

Make another video building a crawler using it. Nice video!

SkySesshomaru
Автор

Strange, I have installed requests_html but when I import it in a Python script in Python 2.x or 3.7, I get: ModuleNotFoundError: No module named 'requests_html'

Hegelian
Автор

what the extension, who print the result down?

developerarchitect
Автор

to find td‘s or other elements that pertain to multiple classes you just would have had to put dots in between. Read up on css selectors, jquery also uses them, pretty standard nowadays and less headache than xpaths ;)

WhiterockFTP
Автор

Can I get some help how to install "requests-html" package to be run globally, for example, through Sublime Text?
I am using Conda on Windows 10.

I have been trying to do that, but as I understand so far, it runs only in virtual environment that cannot be used by Sublime? Correct me if I am wrong.

pyxelr
Автор

Please make a video building a webcrawler, would be very insightful!

SimonEliasen
Автор

Thanks for posting this. I've used BS4 and another module to do the JavaScript (render the page) on many projects, it's nice to have it in a concise package.

Btw, I think the pagination on HackerNews failed because it looks for one of three (by default) "next" labels. "next", "more", "older" (DEFAULT_NEXT_SYMBOLS). The CNBC link has "more" in it.

kylek
welcome to shbcf.ru