DIY Programming Language #2: Tokenising with Finite State Machine

preview_player
Показать описание
In this video I use a Finite State Machine to parse complicated input into manageable tokens before feeding them to the Shunting Yard Algorithm I developed in part 1.

Рекомендации по теме
Комментарии
Автор

12:50, when you said ‘state now’ and ‘state next’ you identified a problem I have been having with my own project that had been throwing me problems for a good part of the year! Thank you!!!’

chrisroode
Автор

I know you’re a good person because you put opening and closing braces on their own lines.

calculus
Автор

I'm brazilian computer engineering student and I love to see your videos because they show me the side beyond the theory, I head about Finite State Machine in my class of "Aspectos teoricos da computação" and we started with Finit Automaton, love it!

roquecosta
Автор

FSMs are one of my favourite CS Concepts. Happy to see you cover them

naitikmundra
Автор

i find it very impressive how your setup never really changes.. i rearrange my monitors and computers every 6 months..

impressive stability

snekkel
Автор

I’m designing an assembler for a home brew instruction set architecture and coincidentally, OLC starts this series on designing custom programming languages! It’s like you’ve sensed that a viewer was in need and came to hold my hand. lol

Love your videos and thanks!

kyleeames
Автор

You always upload exactly what I’m looking for at the perfect time!

matt-xqxv
Автор

Excellent topic! and inspiring to my current project of parsing C++ preprocessor in Python (to convert it into JSON, YAML, CSV, SQL, etc.). And state machines are so fundamental to all kinds of programming that it never hurts to revisit and review the potential of this straightforward but powerful concept.

ScottLahteine
Автор

I can't help but think that a "pure" FSM would deal with decimal points by adding one or more new states, e.g. "seen digits but no point", "seen digits then point then zero or more digits", "seen point" etc

ShotgunLlama
Автор

Awesome t-shirt! Ya seem like you would be an awesome dad! Edit: Love that you leave errors in the code! Reinforces that everyone makes mistakes. Thanks for the video!

ImGelling
Автор

I've never managed to get myself all the way through one of your series, but just seeing you post videos brings a smile to my face.

Have a good day, sir, and keep doing what you do!

jacobmacritchie
Автор

Muahaha! This takes me back. I've written multiple programming languages just for giggles. I remember I became almost OCD with a language where everything is objects and code would be full of monads. What I mean is something like:

123.If > 0
.Then(...)
.Else(..)

I wrote programming language based on XML. I even wrote my own virtual machine. Come to think of it, I've written .NET virtual machine to track execution of programs.

That was back when I had the drive to write stuff on my own time. Then I understood that being professional means you need to be paid to write code 😁

weicco
Автор

Seems fitting to call the language OLC++; OLC# if one feels maverick.

captainufo
Автор

This is one of those "minefield" programming tasks; now we need it to parse numbers formatted with commas instead of decimal points (e.g. 3.1415927 vs 3, 1415927) i.e. make it international. Also maybe apostrophes e.g. $8'100.56 (oh yeah, currencies and units too!)

simaesthesia
Автор

I was just trying to write my own compiler-ish program and this video came out. OLC is always there when you need

brawldude
Автор

*Summary*

This video focuses on building a more robust parser for the DIY programming language using a finite state machine (FSM).

*Key points:*

* *(**0:00**)* Introduction and recap of the limitations of the previous shunting yard algorithm.
* *(**7:28**)* Implementing the FSM in code.
* *(**7:47**)* Housekeeping: Adding standard libraries, updating the `Symbol` struct to `Token`, and creating the `Compiler` class.
* *(**12:02**)* Implementing the FSM states (`new token`, `numeric literal`, `operator`, `complete token`).
* *(**17:47**)* Creating a custom "is digit" function for performance and flexibility.
* *(**21:35**)* Handling whitespace in the input.
* *(**26:54**)* Adding parenthesis handling and a parenthesis balance checker.
* *(**32:01**)* Introducing symbols for variables, function names, etc., and the corresponding FSM state.
* *(**36:18**)* Handling hexadecimal and binary numeric literals.
* *(**40:35**)* Adding additional token types for future use (parenthesis, scope, separator, string literal, keyword).
* *(**41:24**)* Implementing the `string literal` state.
* *(**46:35**)* Integrating the shunting yard algorithm with the new token-based system.
* *(**47:10**)* Demonstrating the improved calculator with different numeric types and syntax checking.

*Result:*

A more sophisticated calculator that can handle complex expressions with different numeric types and basic syntax checking. Future videos will build upon this foundation to implement more advanced programming language features.


i used gemini 1.5 pro to summarize the transcript

wolpumba
Автор

Nice implemenation of simple language. Takes me back to learning lex/yacc (or bison for open source folks), and 'antlr' for Java platforms. Using those, taught myself a lot about 'Backus-Naur' form of describing programming languages. It dovetails nicely with your FSM description of your development.

I note that some of your if-checks are very order sensitive but you don't really mention it. You check white-space first, and numerics before symbols to ensure the right tokens are recognized first.

And so far you don't differentiate between integers and reals. This can be another fun bit to explore a little. Then we could have fun with '<<' or '>>' as shift operators only valid on integer types. Or maybe supporting octal by recognizing a leading zero?? A lot of fun bits to mess around with. :)

mikefochtman
Автор

Some time ago i've made my first compiler, and probably has the most simple tokenizer ever. Its basicly divide all the strings were there is some type of "separator" like spaces, operators and marker simbols, then for the classification i only checked if it matched with the predicted symbols of the language, if its not a simbol its a number, variable or a error (my language only has variable and number as data ), so all i needed is to check if it was numeric or if the variable regex was a match. I got really proud of it :D

MrVinicius
Автор

Ooo right up my street. Grabs some popcorn and off we go!

stephenelliott
Автор

That reminds me of when i made a json parser. Surprised to see that smart people actually do it similar to how i did it. Definitely have to revisit the project and see if i can improve something

xnocki