About Commodore 64 BASIC Abbreviations

preview_player
Показать описание
Jeff Birt asks about the how and why of C64 BASIC abbreviations, and I do my best to explain it all. Are ? and PRINT the same thing? Why does typing "gosuB" result in "mid$su"?? Let's dig in with help from Transactor magazine, the machine language monitor in the Super Snapshot cartridge, and the Complete Inner Space Anthology.

Subscribe to Hey Birt! on Youtube:

Index:
0:00 Jeff’s Question
1:42 Robin’s Intro
2:05 Transactor Magazine
3:37 BASIC abbreviation examples
5:07 About crunching
6:02 More unusual abbreviations
8:03 No abbreviation for INPUT
8:50 Examining the buffer
10:39 Karl Hildon’s Complete Inner Space Anthology
12:12 Jim Butterfield’s SuperChart
12:52 About keyword tokens in BASIC memory
17:02 Inspecting the BASIC keyword table in ROM
19:19 The bizarre gosuB
22:20 Using BRKs to inspect the crunch; about vectors
26:48 An example of crunching
29:13 Disassembling the tokenization routine
35:00 Finally!
36:25 Ending
Рекомендации по теме
Комментарии
Автор

Index:
0:00 Jeff’s Question
1:42 Robin’s Intro
2:05 Transactor Magazine
3:37 BASIC abbreviation examples
5:07 About crunching
6:02 More unusual abbreviations
8:03 No abbreviation for INPUT
8:50 Examining the buffer
10:39 Karl Hildon’s Complete Inner Space Anthology
12:12 Jim Butterfield’s SuperChart
12:52 About keyword tokens in BASIC memory
17:02 Inspecting the BASIC keyword table in ROM
19:19 The bizarre gosuB
22:20 Using BRKs to inspect the crunch; about vectors
26:48 An example of crunching
29:13 Disassembling the tokenization routine
35:00 Finally!
36:25 Ending

_Bit
Автор

6:29 "pR" is "print#" because if two tokens have matching prefixes, the longer one needs to come first in the search table so that the shorter one doesn't permanently "hide" the longer one.

8:21 If the high bit is set in the input text, it effectively acts as a wildcard that matches the first entry found with the common prefix, so "iN" and "inpU" match "input#". You can't seem to set the high bit on the last character of a token without getting weirdness.

10:00 As I understand it, there are only three $00 bytes that end a BASIC program: the $00 that terminates the last line and then $00 $00 as the "null pointer" link that indicates there are no more lines. The fourth $00 in your display is probably the content of your previous example lines overwriting each other.

10:40 The Anthology is my go-to reference. The reprint included the C128 memory map.

19:02 If "input" came before "input#", then "input#" would always be parsed as the token "input" followed by the character "#" and the "input#" token would never be found using the simple linear-search method implemented in the token cruncher.

23:58 Lots of BASIC wedges cheap out and intercept errors or take over the zero-page lexical scanner, but proper BASIC extensions take over the various extension vectors and process the extension keywords in the same was as stock BASIC.

csbruce
Автор

Some notes on this…
1) Fun fact: The PET 2001 ROM V.1.0 (the one without the hex monitor) didn't have a "GO" token, this came only with ROM 2.0. If you try "gosuB" with this one, it will just tokenize the PETSCII-character values for "gosu", character by character, missing the "B", since any bare characters with the sign-bit set ($80 and higher) will be ignored. (This is also, why there's the special case for the pi-character, $FF, in the tokenizer routine.)

2) The magical difference $80 works only, because it's actually the absolute difference – which is the really clever bit in this, or the crucial bug, depending on your point of view. Negative $80 is the same as positive $80 in 8-bits! So it doesn't matter, which of the two operands is bigger, it's all about the absolute offset in PETSCII values. This wouldn't work with any other offset. (Also, it's just the sign-bit set, quite handy for coding.)

3) The "mid$" mystery also works with "gotO", resulting in "mid$t", but no other token.

4) On the "gosuB"/"mid$u" bug
What actually seems to happen is that the matching routines fails on matching "gosub" as both the parsed string and the token have the sign-bit set on the last character and there is no difference, indicating to the tokenizer that there's still another character to match. At the same time, we also hit the end of a word, resulting in the pointers into the input-string and the keyword table to be reset in order to check for the next token, but without incrementing the keyword/token index. (As for the token matching, we're still looking into the same keyword, since we had a perfect match on the last character, indicating more to come.) As a result, the keyword-index lags behind by 1 as the search resumes with an attempt to match "return". While the token for "return" is 14 in actuality, this is now 13, and so on. When the search eventually finds a match, we're still a token count behind, resulting in "mid$".
The same bug applies to "gotO", starting over with the index for "goto" being reused for "run" (now 9 instead of 10). This works only, because "go" is a substring of "gosub" and "goto" and there will be eventually a match using the degraded token index.
Somewhat related, if we try to input "10 input<C= + T>" (<C= + T> is the graphics character resulting from "#" with the sign-bit set), we get neither the "input#" token nor the token for "input", but the PETSCII character sequence "input" (not tokenized). The same is probably true for "open#" with the sign-bit set on the last character.

Edit: Actually, we just spill over into the next keyword as the uppercase "B" in the input buffer matches the last character of "gosuB" in the keyword table and the pointer into the input buffer will be reset only as we fail the match on the next keyword ("return").

noland
Автор

All these years thinking it was an intentional feature. Thanks a lot, really appreciate these vids even though I rarely use basic these days.

lostindesolation
Автор

Suddenly I remember how I expanded the c64 basic by altering these vectors, intercepting the tokenize, detokenize and run processes and adding my own lame stuff. It began with a book outlining the work, and off i went to do slow stuff fast mostly for costly things like scrolling (move, fill, colors), sprite and hires, and even stubbed my toes on the sid. Happy days! Thank you for these videos.

KetilDuna
Автор

Thanks so much for this video (and the others you've made). I'm just getting into retrocomputing after picking up a C64 recently. I have been having a lot of fun with it, and have learned a lot from your terrific videos.

rga
Автор

I think a great part 2 to this video would be showing how to add additional basic commands via wedging chrget. This is something the 11 year old me found to be magic even though I had a good understanding of machine code and 6502. It still amazes me that I manage to learn 6502 with only a machine code monitor and no books, documentation or teacher. I just worked it out and strangely felt more comfortable in 6502 than basic! I think I still have somewhere my original hand written decimal, hex, pet ascii, mnemonics look up tables similar to what was in your book.

dwhxyz
Автор

Wow, who would have suspected that such a simple thing that seems to be a deliberate choice to aid productivity actually was a subtle bug that opens up how tokenisation in BASIC is performed? Amazing!

MrGoatflakes
Автор

gosuB -> mid$su explanation...

Norbert Landsteiner beat me to the punch on this one, but since I stepped through the ROM to figure this out I wanted to post my own short dedicated explanation...

The crunch routine actually identified the GO token. However, by that time the token counter that indexes which token is being compared to had already "slipped a cog, " so it was off by one. Every time there is a mismatch, the routine skips ahead to the end of the token (the shift last letter), and increments the counter. Since the B in gosuB matched exactly, the routine was fooled into thinking it was comparing against a single token called gosuBreturn, which failed... but it only resulted in a single increment rather than 2.

AgentFriday
Автор

I hadn't heard of The Transactor until this video. I am devastated I couldn't get it back in the day. The articles are fantastic. I've been reading a ton of them since this video. Thank you sir!

BillSzczytko
Автор

Wow, great video Robin. Some serious research went into that, thank you !!

slaine
Автор

I just looked up Karl Hildon’s Complete Inner Space Anthology and found a PDF version. And I gotta say WOW! It's basically a big book of "cheat sheets" that covers an absolute ton of stuff and it even covers the Plus/4 and C16. Thanks for cluing me into this amazing book, this is going to come in handy for a couple different projects I have going on. Hopefully I can snag a copy on eBay as this is valuable reference material to add to the collection.

LeftoverBeefcake
Автор

There was a book with an adventure game in it (forget what, precisely) published in the UK, and in the notes for C64, it required one GOSUB (goS) and one RETURN (reT) to be abbreviated purely due to line length.

PhilReynoldsLondonGeek
Автор

I bought Jim Butterfield's book to learn Assembly to directly program the graphics on my C128 in 80 column 2MHz mode.

billkeithchannel
Автор

So those abbreviations are a quirk, which saves us typing long basic commands, and saves instructions in the tokenizer. Very cool!
Would you please explain one day how updating basic lines works, and inserting lines between others. That must be rather interesting.
Did, for example, saving and reloading speed things up? Or is saving just a simple memory dump of the ceunched basic code ram?
So many questions!

MeriaDuck
Автор

Cool bug! Mine is when you put a color character in a rem statement and it gets parsed. For instance:
10 REM (shift insert/backspace) (ctrl-1) <--- display an inverse spade character
becomes "10 rem ATN" when you list it. I thought it would be cool to have color rem statements back in the day and found out that it doesn't work that way.

_Hertz
Автор

I know it must seem odd to people who aren't touch typists, but for me it takes more time to use the 2-keyboard shortcuts than it does to type the whole 'word' out. Been typing for more that 40-years.

Vector_Ze
Автор

You worked with Jerri on the C64DTV? That's awesome! I have two of them. I have plans to modify one to have all the C64 ports, soldered to points on the board, to make a full working C64 out of it.
Excellent and very thorough explanation about BASIC commands here as always, so thanks! And I was wondering if you could do a video explaining how the C64 determines when it's trying to divide by zero so it knows to throw an error.

Skyfox
Автор

And another interesting thing maybe to show how people used to hide basic commands. like "10 SYS4096" into "10 HELLO" . If I recall this was done inserting specials chars into the REM statement at the end of the line ? Also, you could do something to stop the program being fully listed and stop after the first line, again this may have been done with a REM statements.

dwhxyz
Автор

Hey, fellow Canuck here from Kingston. I think what may be happening with the "go su" being interpreted as "mid$" is possibly due to missing the "to" after that. I don't think "go" by itself can do anything and the interpreter may not know what to do with it so it, as you pointed out, uses the last command it was on.

I think "goS" is the short form for "gosub".

It's funny, even after all these years, I still have all of these short forms memorized as I programmed the C64 quite a bit back in the day. I ran a BBS off of my C64 using EBBS at the time as well. Good times.

NeilRoy