Self Compiling Compilers - Computerphile

preview_player
Показать описание
Using T-Diagrams, Professor Brailsford shows us how to take our compiler to the next level.

This video was filmed and edited by Sean Riley.

Рекомендации по теме
Комментарии
Автор

Back in the 80' s we used to do this sort of thing all the time, to extend languages. You would write a compiler that understood the new language extensions, compile it under your existing compiler, then write another compiler that used the language extensions, and compile that using the extended compiler.
e.g. if your language does not have constants, you write a compiler system that could lex/parse/codegen the term "CONST <Identifier> <Value>"
Then write a new compiler that uses "CONST"'s and compile it with the CONST-aware compiler (you could even re-feed this back through itself to generate a better compiler)
A few simple tweaks to the compiler code and you can now have ENUMs, and so on


The most fun was writing optimising compilers and feeding them through iterations of themselves :)
FORTH was an especially great language to do this process on

Kyrelel
Автор

I once read warning in a book about this, that a malicious developer could insert a piece of code that detects if the compiler compiles itself and injects the source code for injecting that backdoor again. And when the compiler compiles something else it could just install the backdoor into the program. After the compiler has compiled itself with the malicious code once, the developer can even purge his code from version control in order to cover his tracks. This is the main reason, why you must keep all the versions of your compiler. If the backdoor is detected only several versions later, you have to go back to the last non backdoored version and start the bootstrapping process again.

Kotfluegel
Автор

Fun fact: C# compiler is written in C#. Also Typescript transpiler is also written in Typescript. The idea of a self compiling compiler has many examples.

VishvakaRanasinghe
Автор

Just realized this is why gentoo compiles gcc twice when doing an upgrade (or used to at least). One of those "it's glaringly obvious but I had to have someone else point it out to me before I noticed" moments.

sharkuc
Автор

I remember doing similar when I was 14... Had an assembler running in BASIC that was extremely slow, so wrote a new one in Assembler, got the BASIC assembler to assemble that, and then used the binary version of the assembler to improve and complete itself. Was back in the day, when 6502 assembler was a thing. Was happy with the result, assembled itself something like 100 times faster than the BASIC version in the end.

threeMetreJim
Автор

The machining tools analogy really unlocked this for me! 👍

fullerdb
Автор

The best videos on this channel are those with Professor Brailsford.

I guess the compiler series also has some episodes about disassembly and then decompilation. (Decompilation is hard, because of UB (undefined behavior) in C (the target language))

Edit: the series should also contain a video about undefined behavior in C and C++, because it is a serious topic.

cmdlp
Автор

I Remember a story from "the Jargon file" about someone modifying the compiler to install a backdoor in the login software and a backdoor seed in the compiler. then compiling itself and removing all traces.

jasongladen
Автор

Very interesting angle on self compiling compilers - improving quality of code and compiler. My own career experience (forty years!) was doing this as a technique to achieve portability - to run software on new machines / chipsets etc. Excellent - thanks!

Richardincancale
Автор

Great to see that the stock of dot matrix printing paper is still going strong.

rationalityfirst
Автор

Ever since I learned that self compiling compilers exist I live in fear

incelstate
Автор

For compilers that aren't C the first step is probably going to involve a simple compiler for some new language written in C rather than written in ASM we can do that cause someone has already written a working C compiler for most every platform and C is much easier to use than ASM

MonochromeWench
Автор

This greatly reminds me of the RepRap Project, which is especially clear when you may or may not knowingly referred to it when talking about self replicating 3D printers. They very much exist and they are commercially viable given how they’ve gotten the desktop 3D printing industry to drop prices like mad.

topsecret
Автор

Quite interesting, recently started writing my own compiler (mitsy if anyone interested) and right now I'm working on the most essential part of it, namely the bit that takes C literals and transforms them into hard values, part I'm struggling with is the exponent on floats, everything else has been a relative breeze as far as literals go

zxuiji
Автор

Another way of dealing with the bugs in the bootstrapping process is to write a really simple to compile C compiler that BinA can compile and can self compiler, then write a second version of a C compiler that's more advanced, and work on getting the first C compiler code to be able to compile that. Adding in a third C compiler could even be needed, and if you're in a very complicated language I could see even more stages being helpful.

JoshuaHillerup
Автор

Brings back difficult memories from university. I had to write a Scheme interpreter in Scheme. Compilation was the only class I abandoned.

nicflatterie
Автор

When i see prof. Brailsford i click faster than Thanos can snap.

LordomusPL
Автор

Please make a video on Reflections on trusting trust by Ken Thompson

NextLevelNoob
Автор

Cf. Ken Thompson's 1984 Turing Award lecture "Reflections on Trusting Trust"

LazyToad
Автор

I'm glad you made that video because I read somewhere that when compiling a compiler, it would need to get compiled by itself again and again to make sure it wouldn't break and shoot bugs but it was not explained that the quality of the latest versions would be better, so I thought it would become worse. Your explanation was very clear as to why not :)

LLoydsensei