DIY Programming Language #2: Tokenising with Finite State Machine

Показать описание

In this video I use a Finite State Machine to parse complicated input into manageable tokens before feeding them to the Shunting Yard Algorithm I developed in part 1.

Рекомендации по теме

Комментарии

12:50, when you said ‘state now’ and ‘state next’ you identified a problem I have been having with my own project that had been throwing me problems for a good part of the year! Thank you!!!’

chrisroode

I know you’re a good person because you put opening and closing braces on their own lines.

calculus

I'm brazilian computer engineering student and I love to see your videos because they show me the side beyond the theory, I head about Finite State Machine in my class of "Aspectos teoricos da computação" and we started with Finit Automaton, love it!

roquecosta

FSMs are one of my favourite CS Concepts. Happy to see you cover them

naitikmundra

i find it very impressive how your setup never really changes.. i rearrange my monitors and computers every 6 months..

impressive stability

snekkel

I’m designing an assembler for a home brew instruction set architecture and coincidentally, OLC starts this series on designing custom programming languages! It’s like you’ve sensed that a viewer was in need and came to hold my hand. lol

Love your videos and thanks!

kyleeames

You always upload exactly what I’m looking for at the perfect time!

matt-xqxv

Excellent topic! and inspiring to my current project of parsing C++ preprocessor in Python (to convert it into JSON, YAML, CSV, SQL, etc.). And state machines are so fundamental to all kinds of programming that it never hurts to revisit and review the potential of this straightforward but powerful concept.

ScottLahteine

I can't help but think that a "pure" FSM would deal with decimal points by adding one or more new states, e.g. "seen digits but no point", "seen digits then point then zero or more digits", "seen point" etc

ShotgunLlama

Awesome t-shirt! Ya seem like you would be an awesome dad! Edit: Love that you leave errors in the code! Reinforces that everyone makes mistakes. Thanks for the video!

ImGelling

I've never managed to get myself all the way through one of your series, but just seeing you post videos brings a smile to my face.

Have a good day, sir, and keep doing what you do!

jacobmacritchie

Muahaha! This takes me back. I've written multiple programming languages just for giggles. I remember I became almost OCD with a language where everything is objects and code would be full of monads. What I mean is something like:

123.If > 0
.Then(...)
.Else(..)

I wrote programming language based on XML. I even wrote my own virtual machine. Come to think of it, I've written .NET virtual machine to track execution of programs.

That was back when I had the drive to write stuff on my own time. Then I understood that being professional means you need to be paid to write code 😁

weicco

Seems fitting to call the language OLC++; OLC# if one feels maverick.

captainufo

This is one of those "minefield" programming tasks; now we need it to parse numbers formatted with commas instead of decimal points (e.g. 3.1415927 vs 3, 1415927) i.e. make it international. Also maybe apostrophes e.g. $8'100.56 (oh yeah, currencies and units too!)

simaesthesia

I was just trying to write my own compiler-ish program and this video came out. OLC is always there when you need

brawldude

*Summary*

This video focuses on building a more robust parser for the DIY programming language using a finite state machine (FSM).

*Key points:*

* *(**0:00**)* Introduction and recap of the limitations of the previous shunting yard algorithm.
* *(**7:28**)* Implementing the FSM in code.
* *(**7:47**)* Housekeeping: Adding standard libraries, updating the `Symbol` struct to `Token`, and creating the `Compiler` class.
* *(**12:02**)* Implementing the FSM states (`new token`, `numeric literal`, `operator`, `complete token`).
* *(**17:47**)* Creating a custom "is digit" function for performance and flexibility.
* *(**21:35**)* Handling whitespace in the input.
* *(**26:54**)* Adding parenthesis handling and a parenthesis balance checker.
* *(**32:01**)* Introducing symbols for variables, function names, etc., and the corresponding FSM state.
* *(**36:18**)* Handling hexadecimal and binary numeric literals.
* *(**40:35**)* Adding additional token types for future use (parenthesis, scope, separator, string literal, keyword).
* *(**41:24**)* Implementing the `string literal` state.
* *(**46:35**)* Integrating the shunting yard algorithm with the new token-based system.
* *(**47:10**)* Demonstrating the improved calculator with different numeric types and syntax checking.

*Result:*

A more sophisticated calculator that can handle complex expressions with different numeric types and basic syntax checking. Future videos will build upon this foundation to implement more advanced programming language features.

i used gemini 1.5 pro to summarize the transcript

wolpumba

Nice implemenation of simple language. Takes me back to learning lex/yacc (or bison for open source folks), and 'antlr' for Java platforms. Using those, taught myself a lot about 'Backus-Naur' form of describing programming languages. It dovetails nicely with your FSM description of your development.

I note that some of your if-checks are very order sensitive but you don't really mention it. You check white-space first, and numerics before symbols to ensure the right tokens are recognized first.

And so far you don't differentiate between integers and reals. This can be another fun bit to explore a little. Then we could have fun with '<<' or '>>' as shift operators only valid on integer types. Or maybe supporting octal by recognizing a leading zero?? A lot of fun bits to mess around with. :)

mikefochtman

Some time ago i've made my first compiler, and probably has the most simple tokenizer ever. Its basicly divide all the strings were there is some type of "separator" like spaces, operators and marker simbols, then for the classification i only checked if it matched with the predicted symbols of the language, if its not a simbol its a number, variable or a error (my language only has variable and number as data ), so all i needed is to check if it was numeric or if the variable regex was a match. I got really proud of it :D

MrVinicius

Ooo right up my street. Grabs some popcorn and off we go!

stephenelliott

That reminds me of when i made a json parser. Surprised to see that smart people actually do it similar to how i did it. Definitely have to revisit the project and see if i can improve something

xnocki

DIY Programming Language #2: Tokenising with Finite State Machine

DIY Programming Language #2: Tokenising with Finite State Machine

diy programming language 2 tokenising with finite state machine

DIY Programming Language

How To Java: Basic Programming Language #2 - Tokenizer

DIY Programming Language #1: The Shunting Yard Algorithm

Create Your Own Programming Language - The Tokeniser (Part 2)

Let's build the GPT Tokenizer

Want To Make A PROGRAMMING Language?! #2 Watch THIS Video! [Compiler]

Making a programming language in 3 videos | Part 1: Tokenizer

LambdaConf 2015 - Make Your Own Programming Language! Nick Heiner

2018 - MAKE YOUR OWN PROGRAMMING LANGUAGE - LEXER (1/3)

A Language Tokenizer in Elixir

how I sped up python's tokenize module by 25% (intermediate) anthony explains #221

Writing a Self-Tokenizing Programming Language From Scratch | Coding Day 111

Create a Programming Language - Episode 4: Tokenizing Numbers

[Let's Make a Programming Language] Episode 2: Building the parser part 1 (Creating the syntax)

[Let's Make a Programming Language] Episode 7: Building the tokenizer part2

Let's train an AI model to generate recipes! 🍪 #python #ailearning #huggingface

How To Build A Programming Language From Scratch

Creating a Programming Language In Discord? Part 2. Parser

Devlang #2 - Starting tokenising

My Own Programming Language in 2 Hours — Offline Stream #08

I Made My Own Programming Language

Create Your Own Programming Language - The Tokeniser (Part 4)