CppCon 2018: Hana Dusíková “Compile Time Regular Expressions”

preview_player
Показать описание


I will present a library that utilizes a feature of C++20 to build regular expressions from compile-time strings. This is a novel approach that doesn't have ordinary disadvantages of other regular expression implementations like having to use a linked library or a run-time cost of parsing and interpreting an internal finite machine.

You will see implementation details of the library and problems I have run into during its writing. In the last part of the talk, I will compare other implementations of regular expression engines and show compiled code in Compiler Explorer.

Hana Dusíková, Avast
Senior Researcher

Hana is working as a senior researcher in Avast Software. Her responsibility is exploring new ideas and optimizing existing ones. She also propagates modern C++ techniques and libraries in internal techtalks and gives talks at local C++ meetups.

She studied computer science at Mendel university and subsequently taught several courses there, including: Data Structures, Computability and Complexity, and Formal Languages and Automata.

*--*
*-----*
Рекомендации по теме
Комментарии
Автор

Good video, enjoyed watching it.
Especially the moment when she revealed compilation time benchmarking

zzzXopHeTzzz
Автор

Thank you for sharing this! That's a great use of compile time evaluation. Can't wait to see it revised for C++ 20 proper! :)

PROMuffy
Автор

wow, shows how strong keep it simple can be, even with complex things like compile time evaluation in c++17. amazing

jhbonarius
Автор

Great presentation of a fantastic idea.

JonathanSharman
Автор

Wow! Great job on the lib and talk. Just brilliant

bobbymah
Автор

I came across this project some time back when I was creating a similar compile time pattern matching class of about 100 LOC targeting C++14. Since it involved numerical ranges, the regex representation was somewhat large. This librarry only took around ~8ns where as the hand rolled version took ~8.5ns while std::regex and intel hyperscan took about 130ns. Our librarry version took ~17.5ns. Made me realize that Avast has some excellent programmars!

HashanGayasri
Автор

Fantastic stuff. Great presentation, I really enjoyed it.

jonathanwatmough
Автор

35:56 there is no a mistake - Plus and Star are correct. The additional case on the next slide is for testing whether the cycle runs 0 times. Because inside the loop, a "Star..." is applied in every iteration.
37:00 given that, I think the match for star should just
return match(begin, it, end, list<opt<plus<Star...> >, Ts...>{});

ViktorEngelmann
Автор

about the runtime behavior - I think the examples are somewhat generous, because the regexps are very deterministic, so the downside of the backtracking almost never happens. I think if you matched a long sequence of 'a' against a*a*b - it will obviously not match, but the algorithm here will spend a lot of time (quadratic) trying all the possible transitions from the first to the second a*. And you can make it any polynomial rank by just adding more a*... If you did a**b, a***b etc. I'm not even sure how it would behave - I suspect a**b is exponential, a***b might be double-exponential. I'm not even sure a**b would even terminate, because the outer star might add infinitely many ​​a*'s that generate ​ɛ...

ViktorEngelmann
Автор

Very nice, but surely, if you're going to compile simple parsers like that, you should use something more well-typed than regular expressions? Doesn't C++ have proper parser combinator libraries by now that achieve the performance simply by inlining of the primitive parsers, and can furthermore parse the data right into a suitable type (probably with `std::variant`) and give you compile-time errors when what you're trying to match doesn't have the right shape?
That at least is how this kind of stuff would be done in Haskell, except in very simple, not performance-critical applications where a normal runtime regex engine does the job just fine.

leftaroundabout