How regexes got catastrophic

preview_player
Показать описание
This video has a page on 0DE5 with exercises and resources

Chapters
00:00 - Intro
03:32 - Backtracking and lockstep
13:48 - ReDoS
25:34 - Backreferences
50:00 - The prime number regex
28:38 - Exercises
Рекомендации по теме
Комментарии
Автор

i understand nothing but lowkey enjoying what is happening

focando-lol
Автор

Please don't stop uploading. I found this channel by chance and it has to be one of the best things happening this year

yaboi
Автор

I now understand why i'm good at regex where most people fail
I never use backreferences
Like idk why but they never felt right logically
Even when i made crazy things like a youtube URL parser to clean those in regex, i've found ways to just do it without backreference when i could have used some
And it's kinda cause when i'm building the regex, i'm running it in my mind and backreference just makes that impossible
Like tracking what it does become too complex
So big thanks for this vid, very informative and great !

Julienraptor
Автор

YT algorithms just decided that my constant work with regex deserves this video. Thank you algorithm.
It was very pleasant to watch.

wojciechostrowicz
Автор

FYI, Rust's standard regex library uses an O(mn) algorithm without backreferences.

LeeDanielCrocker
Автор

I can't believe this video has only 31K views. All this work the amazing visualizations, the quality of the explanations, the lined exercises in the description. I truly hope all this work would get rewarded some day. Thank you so much.

rkvkydqf
Автор

i got a (.*?) tattoo when i barely knew what regexes were lol. thank you so much for this series, it has done wonders for my ability to live with that decision (and is also some of the best comp sci content I've ever seen on this platform <3 <3 <3)

_-_-_-_-_swo
Автор

Reminded of what JWZ said: "Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems."

JasonWalter
Автор

Where [regex] blood makes [regex] blood unclean! First-time watcher and this was great.

erik_with_a_kay
Автор

I studied this 4 years ago at uni and totally forgot this is how we reason about regexes! Super helpful video thank you!

sardonyx
Автор

I've always had a sneaking suspicion that regexes were too versatile and did too many things to actually be performant, turns out I was right

though I did not expect back references to break the entire concept of a language being regular

will probably use Google's regex engine in the future since I barely ever use backreferences anyway

AstonishedByTheLackOfCake
Автор

Why not use the lockstep algorithm, when the regex has no backreference? It would be easy to just store a boolean with the automaton, that indicates whether it has backreferences, and pick the algorithm accordingly. This would limit catastrophic cases to regexes that actually use backreferences, which could be taught as something to be avoided.

NikiA_
Автор

10:25 this transition cracked me up haha great video!

bntegor
Автор

My personal position is that once one has spent a few hours trying to convince various HTML parsing libraries to _only_ parse the input string and stop reformatting it, parsing with a regex starts looking pretty good.

somdudewillson
Автор

I use a regex to finite state machine code generation tool occasionally for tricky problems, and it's always violently obvious when I've accidentally added nondeterminism to my input regex - the state machine that comes out of the other side blows up spectacularly.

JamesChurchill
Автор

This is amazing!! I read about Thompson's algorithm last week when I was studying non-deterministic automata and the fact that regex engines in most modern and/or popular programming languages are slower than it and suffer from exponential blowup for longer expressions (if I remember correctly). The visualizations of algos was great and helpful in understanding them. Thumbs up for that!!

All this increases my respect for these giants: programming all these using ed, the standard editor, on a teletype connected to a computer which was much slower than our present day handheld gadgets.

dr.strangelove
Автор

What a great overview of this! Great refresher of things I haven't though much about since college, and explained more concisely than any of my professors managed to.

lag
Автор

Thank you for sharing the stackoverflow answer, I’ve never felt so seen

faldarith
Автор

I'm really drunk and I don't know what you're talking about but I'm enjoying it

nolanmccarthy
Автор

Excellent rundown. I've been using RE2 for years. You might have mentioned that a Thompson non-deterministic automaton can be converted to a deterministic one, where at most one state is active at a time. See "subset construction." This is what lex/flex and similar tools do. Run time is linear in the input length. Zero penalty for number of states. Of course there's no free lunch. Pathologically bad regex'es can yield minimum machines (yes the algorithm can yield the unique minimum-state machine) still exponential in regex size. But unless you are doing something silly like computing regexes on the fly, this is easily caught at compile time. There exist libraries that provide such recognizers (vice flex et al that generate code).

generessler