The Insanity Of Linux's Regular Expressions

preview_player
Показать описание
Become A Channel Member:

SOCIALS
----------------
Рекомендации по теме
Комментарии
Автор

"The plural of Regex is Regrets."

lockonjunkie
Автор

As the old saying goes:

"Hey, I know, I'll use Regex to solve this problem." Now you have two problems.

stbuchok
Автор

Thank you! I’ve been saying this for 20 years and what I’ve been told is that every regex has to have these subtle differences because “they solve different problems!”

diego
Автор

And then Rust thought they should invent their own RE specification, just to mess with me I'm sure

swanyriver
Автор

I first really learned Regular Expressions from the terrific O'Reilly "Owl" book, Mastering Regular Expressions by Jeffrey E. F. Friedl, second edition (my favorite, though the third edition is equally excellent). I reread it rather often,
Yeah, he goes into a lot of detail, but explains the "standards" are just sort of really strong suggestions.
The great thing about standards is that there are so many!
He suggests picking your favorite tools and learning the idiosyncrasies of how Regular Expressions are used by them. He also recommends using Perl, because it's clearly by far the leader in Regular Expressions.
Perl was designed to replace ALL the various command line tools you mentioned.
The book does explain that the way they work is in a relatively uniform or at least predictable manner, within just a few different "flavors."
The moral of the story is, as always, is to carefully test any pattern before actually putting it into production.

lorensims
Автор

Phew. I was losing my mind with RegEx and VSCode the other day... I gave up. Glad it wasn't my fault.

delicious_seabass
Автор

Love the thumbnail, you have the same facial expression my best friend had when I asked him if he was my best friend.

stalker
Автор

And on posix shell, you have shell expansions and glob patterns to add to the confusion.

ngelf
Автор

I use Perl but yes being modern it works, put '?' (is lazy not grouping on match) instead of the '+' (greedy) in the regular expression,
that is global or don't return on first match. Gives you different and mostly unexpected results. I expected '?' to be the more proper.
Example:

vilijanac
Автор

I am confident with my RE skills of in programming languages, but grep on the commandline always turns into looking at documentation for me. I know that egrep/grep -E is junk, and learned to default to always pass the P flag for the least painful experience.

ForeverZer
Автор

I use Regexes since maybe 25 years, and it is rare that I can code one without looking half of it up again. I pretty much every time use Regex101 (the website) to make and test them, and if coding in C#/Java use Rider/IntelliJ's help to write them (Rider has become incredibly useful). How to name a group? Non-capturing back reference? What exaclty is in a \w ?

der.Schtefan
Автор

I've been using UNIX since 1981 and have been dealing with REGEXs since then. Pretty early on I realized that I needed to think of REGEX as a raw capability that exists in many mutated forms in different tools but one that is expressed slightly differently in each. I figured out what I needed to do and then consulted the manual for the tool to grind through the sequence of characters that would implement the search I needed. for example it's always a crap shoot whether the REGEX compiler in the particular tool wants '[' et al escaped with backslash or stand naked to get the REGEX behaviour.

In some ways I use REGEXs the same way I solved calculus problems. In calculus I could never remember the slew of worked integrals and generally had more success deriving from first principles (my memorization skills are weak). So I kinda do the same with REGEX -- figure out what I want from first principles and then derive the final jumble of characters to implement the match.

I agree that PERL is the better set of Regex functions. I think PERL and the PCRE library would the be way I would present REGEXs in any new tool I created.

ksbs
Автор

My thought is having an external modular regex that let you set an environmental variable to choose which one you want to use across all tools. That would allow for backwards compatibility as it would behave normally unless you have something like "default_regex=pre" set. The tools of course would have to be rebuilt with this in mind, but it would bring some consistency to the current chaos.

Seaoftea
Автор

At a previous job, I started getting really into my dot files, solving many of the smaller problems I had (they had a really esoteric and out of date.... everything) with increasingly elaborate bash scripts running into all the different commands with different versions of regex and different syntaxes and even different options with similar names.

I was getting pretty out of control (I once made a ridiculously inefficient method to find out if any item was in an array without using subshells, on an older version of bash and without regex or cases) and eventually tamped down once I realized how ridiculous it was to stay logged into work over the weekend just to work on my dot

I mean, I now know about a ton of things like Shell Parameter Expansion, basics of pipes etc, but I must admit, if anything It made me realize that a lot of what I was doing should have probably just been a python script at the end of the day or not at all 😅

BeefIngot
Автор

3:35 i gave up the simple substituitions with sed on discovering lack of PCRE several months ago & decides to use straight up Perl

yash
Автор

This video explains so much!, I feel a bit less dumb for randomly failing at regex, I'm gonna check your blog for more info later, thank you!

willft
Автор

Unfortunately, PCRE can be tricked into exponential runtime, which is why RE2 was developed. RE2 is missing a lot of features that prevent programmers from accidentally writing matchers that hang on malicious input.

sfllaw
Автор

Amen agree Perl RE should be the default for everything

vincefinch
Автор

I’m amazed at how much I disagree. If you go back to “original Unix”, you had commands like “ed” and “sed” that used what are now called “BRE”. The goal was minimal size because “ed” was the only editor on the boot volume. So if you had a problem at boot that you needed to fix, you had to know how to use “ed” and use it well. “sed, while not on the boot volume, came from “ed” — thus it had a simpler regular expression capability. “grep” had BRE and “egrep” had extended regular expressions. “egrep” was the only tool back then with alternations. And, of course, “fgrep” had no regular expressions at all and was nice when grep’ing for dots and other special characters.

GNU came along and started combining tools into one big mass. This is where the confusion probably started. And, oh, by the way, No… Pearl regular expressions are the odd man out. They came late to the game and caused the confusion. “POSIX”, as with all standards is just for idiots — sorta like NATO and the UN.

So… yea. About the only thing I agree with is one statement that you disagreed with at the beginning of the video: learn your tools.

You also started out the video with a mention of “greedy” and “non-greedy” constructs. “?” and “+” are not either one of those as far as I’m concerned and if you look in the pages I looked at while trying to verify your video, “greedy” isn’t found. As far as I know, there are various forms of grouping constructions like parens with special decorations that denote greedy and non-greedy matching. That’s the only time I’ve seen those two terms used.

Last: Pearl was a priceless improvement when it first came out in the early 80s. It was used for scripts that were complex. It was easier to write in Pearl for complex scripts than it was in sh or csh. But when bash came out (and probably when ksh came out but I didn’t use ksh), that advantage was lost and I quit writing pearl scripts and thus stopped keeping up with its progress which seemed to stop dead in the early 90s. So, that is another reason to ignore the Pearl regular expressions. If you seriously need really complex REs, you are probably doing search and replace — not just search and at that point, you throw it into Emacs and use its full power to get what you need done. Also pipe lines of greps can solve the extreme edge cases where complies REs might be needed far simpler than trying to figure out how to do it in one pass.

pedzsan
Автор

I have a pretty good understanding of BREs and EREs, and I know pertty well what PCREs can do, even if I have to open "perlre" to figure out how to do it. When I'm grepping, the moment I go for a backslash, I add the -E switch, and when I want a PCRE, I also pretty much want perl itself. After that, if I still haven't solved the problem (and I'm no longer having fun), I reckon I'm using the wrong tool and go write a manual parser in C++ or something.

MCLooyverse