Intro to async Python | Writing a Web Crawler

preview_player
Показать описание

async/await and asyncio.

Asynchronous programming in Python uses the async/await keywords and the asyncio library. It's a style of programming that takes advantage of the fact that it's possible to wait for many things at the same time, so it sees use particularly in libraries that do heavy IO, like writing to disk or waiting on packets from the network. In this video, we are introduced to the basics of the syntax, and then get into an implementation of a basic web crawler.

SUPPORT ME ⭐
---------------------------------------------------
Sign up on Patreon to get your donor role and early access to videos!

Feeling generous but don't have a Patreon? Donate via PayPal! (No sign up needed.)

Want to donate crypto? Check out the rest of my supported donations on my website!

Top patrons and donors: Jameson, Laura M, Dragos C, Vahnekie, Neel R, Matt R, Johan A, Casey G, Mark M, Mutual Information

BE ACTIVE IN MY COMMUNITY 😄
---------------------------------------------------

CHAPTERS
---------------------------------------------------
0:00 Intro
0:14 The idea
1:35 Async/await and asyncio
5:58 Writing a web crawler
13:18 Thanks to Brilliant
14:04 The end
Рекомендации по теме
Комментарии
Автор

Nice, but I find your example too complicated for a video. And I'm an experienced async programmer.
I can follow, but at least a diagram with the queues would be helpful for global comprehension.
And a diagram of how the ioloop schedules coroutines, because there are a lot of assumptions and I believe it's hard for beginners to follow without visual cues.

unperrier
Автор

The idea of explaining async by printing out real world tasks is genius

thepaulcraft
Автор

having additional things to try out at the end of the video is awesome. your content is always so great! a sincere thank you :)

packetpunter
Автор

I did not know it had a built in HTML parser! Then why do so many people reach for beautifulsoup? Can you do a comparison, and explain if they are or aren't used for the same goals?

unusedTV
Автор

10:54 You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts. so many times but it is not getting to me. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing HTML. You will never make me crack. HTML is a language of sufficient complexity that it cannot be parsed by regular expressions. Even Jon Skeet cannot parse HTML using regular expressions. Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide. The <center> cannot hold it is too late. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty. If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes. HTML-plus-regexp will liquify the n​erves of the sentient whilst you observe, your psyche withering in the onslaught of horror. Rege̿̔̉x-based HTML parsers are the cancer that is killing StackOverflow it is too late it is too late we cannot be saved the transgression of a chi͡ld ensures regex will consume all living tissue (except for HTML which it cannot, as previously prophesied) dear lord help us how can anyone survive this scourge using regex to parse HTML has doomed humanity to an eternity of dread torture and security holes using regex as a tool to process HTML establishes a breach between this world and the dread realm of c͒ͪo͛ͫrrupt entities (like SGML entities, but more corrupt) a mere glimpse of the world of reg​ex parsers for HTML will ins​tantly transport a programmer's consciousness into a world of ceaseless screaming, he comes, the pestilent slithy regex-infection wil​l devour your HT​ML parser, application and existence for all time like Visual Basic only worse he comes he comes do not fi​ght he com̡e̶s, ̕h̵i​s un̨ho͞ly radiańcé destro҉ying all enli̍̈́̂̈́ghtenment, HTML tags lea͠ki̧n͘g fr̶ǫm ̡yo​͟ur eye͢s̸ ̛l̕ik͏e liq​uid pain, the song of re̸gular exp​ression parsing will exti​nguish the voices of mor​tal man from the sp​here I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful t​he final snuffing of the lie​s of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL I​S LOST the pon̷y he comes he c̶̮omes he comes the ich​or permeates all MY FACE MY FACE ᵒh god no NO NOO̼O​O NΘ stop the ͎a̧͈͖r̽̾̈́͒͑e n​ot rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘

probaddie
Автор

Great video, for a while I have been hoping that I would just stumble upon a Python-specific async tutorial.

Cucuska
Автор

Wow that is some great content. But for me some of the webcrawling concepts are a little distracting. It would be so great if you could add a video for async work on Datapipelines ( get some csv files from an fftp, transfrom them and bulk load them to an relational db for example including logging, taks handling and retry logic). I think this would be a great lesson for many data scientists who are not really educated in data eneneering topics. Anyhow thanks for the homework, i hope i will not fail it :-)

ali-omuv
Автор

4:46 You can have a nonempty pending list even in the absence of a timeout. There is another optional argument to asyncio.wait() which specifies whether to wait for all tasks to complete (the default), or just the first one.

lawrencedoliveiro
Автор

The best advice I can give to anyone regarding python's async/asyncio is to read the documentation, as there are a lot of edge cases and it's well documented.

For example you need to save objects from create_task with a class variable or they get garbage collected

brycedevoe
Автор

2:53 time.sleep() blocks the current thread, asyncio.sleep() only blocks the current task.

Remember that a “task” is not a concept of the Python language itself: it is purely a figment of the asyncio library -- it is a wrapper around a coroutine object, which _is_ a concept of the Python language. Tasks are schedulable entities managed by asyncio.

lawrencedoliveiro
Автор

The syntax seems to have improved a bit since I last tried async programming in python 3.6
Still javascript has the best syntax for async programming in my opinion.

royz_
Автор

Dude your videos are so great - making hard stuff easy - thanks!

CritiKaster
Автор

While async is not threading, it would be great if James could clarify how different tasks may communicate properly, for example what happens with variables shared between tasks.

Graham_Wideman
Автор

I've written asyncio applications and it quickly becomes complicated once you mix multiprocessing (external processes having their own ioloop), listening on sockets and pipes (each having slightly different implementation that requires you to write different code for each), your code becomes too complicated, you have to manage everything yourself and navigate abstractions that do the same things but are different (reading lines coming from sockets vs pipes).
To give an comparison, it's like if python only exposed an API for threading where you have to create the thread yourself, create local storage yourself and schedule work to run in the thread via a queue, for every thread, all by yourself. That'd be terrible API to use... well asyncio is at that level imo, not high enough (yet) to be awesome.

unperrier
Автор

Wow, this is crazy similar to Rust! The difference is pretty much only function names. And such a clear explanation with a good analogy, I can share this with Rust newbies.

Автор

Thank you...the idea of waiting on multiple things at once is what made it "click" for me

quitethecontrary
Автор

This channel is like top 3 channels for Python on the web.
Thanks for this and keep it up!

Phaust
Автор

wow, this is by far the best asyncio tutorial I have encountered. thank you!

WebMafia
Автор

Yes, HTML is not a regular language, however your problem might be. For example, if you are searching for the only id="..." on the page, you can totally do that with a regex.

efaile
Автор

Nice intro, a bit fast into those queues.

DanielMaidment