How Are State Machines Used In Parsing? (16-Bit VM in JavaScript 008)

preview_player
Показать описание
In this episode we begin building the parser for this VMs assembly language. During the process we'll encounter a tricky problem with parser combinators, and how we can solve it using state machines.

=[ ℹ About ℹ ]=

This series is all about building a powerful virtual machine in JavaScript with the following features:

- A flexible, extensible, register-based virtual machine
- Support for signed, unsigned and floating point operations
- A call stack
- Interrupt capabilities
- Ability to do memory mapping for IO
- An assembly language with macro and module support
- A higher level, C like language. We'll use and expand the library from the parser combinators from scratch series
- And finally, to be able to take the whole thing into the browser and extend it to create a sort of fantasy console - an emulator for a machine that never existed

Рекомендации по теме
Комментарии
Автор

> when I log to console, the result body says
>{ result: Object [Generator] { [Symbol(Symbol.toStringTag)]: 'Generator' }, ...} seems like the generator never runs

Use "run" instead:

const movLitToReg = A.coroutine(run => {
run(upperOrLowerStr('mov'));
run(A.whitespace);

const arg1 = run(hexLiteral);

run(A.optionalWhitespace);
run(A.char(', '));
run(A.optionalWhitespace);

const arg2 = run(register);
run(A.optionalWhitespace);

return asType('INSTRUCTION')({
instruction: 'MOV_LIT_REG',
args: [arg1, arg2]
});
});

masakore
Автор

Opened youtube and see this uploaded 11 seconds ago! Today is a good day! :)

theitatit
Автор

Another way to define a expression is as:

expr: term ("*" term)*
term: factor (("+"|'-") factor)*
factor: literal | variable | "(" expr ")"

rayquazaboladao
Автор

great series, kinda wondering if I am missing something though, when I log to console, the result body says
{ result: Object [Generator] { [Symbol(Symbol.toStringTag)]: 'Generator' }, ...} seems like the generator never runs

macrossfiru
Автор

I like the explanation of FSMs :) and the practical use of arcsecond


I have a question, though. What about the order of arithmetic operators? e.g. in "1 + 2 * 3", I would expect the result to be 7, not 9 :)
From what I know, constructing the AST is the next phase of parsing (syntax analysis), and this is lexical analysis so far. Are you going to address that in future videos?

Gelio
Автор

There seems to be a jump here from the previous videos in this playlist. As someone who didn't follow your 'Creating a parser' playlist (which I suppose was the first one), should I visit that series before continuing this one? Got a bit lost here.

giuxbr
Автор

How did you decide on the syntax?

Why this?
[$100 + !loc - ($100 + !loc - ($05 * $31)] <- square bracket mixed with bracket

Why not this?
($100 + !loc - ($100 + !loc - ($05 * $31)) <- using one type of bracket would be consistent and reduces elements within the language?

I bet there is a valid reason why you did what you did, does anyone know?

geogreenmanu
Автор

You say we need the FSM because the first Grammar

expr = hex | var | ( expr ) | mathOperation
mathOperation = expr op expr
op = + | - | *

is too eager, and because the second

expr = mathOperation | var | ( expr ) | hex
mathOperation = expr op expr
op = + | - | *

is infinitely recursive in the sense.
However, the way I see it, the problem lies with the fact that your grammar is ambiguous. From your example "$42 + !loc - ($05 * $31)", it is not clear whether the end resoult should be parsed as (with {} signifying nesting of the parsing rules)

{hex op var} op ({hex op hex})

or as

hex op {var op ({hex op hex})}

since the rule for mathOperation can expand to both "{expr op expr} op expr" and "expr op {expr op expr}". A fix that doesn't require FSM would be to introduce a new rule simpleExpr instead, where [a] means 0..* repetitions of the rules.

expr = mathOperation
mathOperation = simpleExpr [op simpleExpr]*
op = + | - | *
simpleExpr = hex | var | ( expr )

This way, any recursion that doesn't consume a non-terminal is eliminated, and the only "problem" left in this case would be associativity and op precedence (the second can be fixed with a split mathOperation rule, or both can be solved in a post-processing stage after a compound expression is completely parsed.

Right?

Adowrath