Re: MDV Software Reclamation
Posted: Mon Mar 02, 2020 11:07 pm
For anyone looking through the the CL source, I was perusing my old code for the BASIC interpreter and what I do in the present is slightly different from what I did almost 30 years ago. I've mastered the split between lexical analysis and parsing a bit better now. If you take a peek in the PARSEcmds() function you see that I first call getcmd() to grab the top level token (which inevitable will be a keyword) and then do a string equality for each potential parse (i.e. is it a "for" or an "if" etc...). When I'm in a specific parse (say for a "for" loop) I then call, in context, either getvar() or getsym() or getkey() since a "for" first has a variable followed by and equal "=" sign, etc... It's the last function, i.e. getkey(), that could be confusing because in reality getcmd() and getkey() kind of do the same thing. The getkey() function does optimize a bit since it is not returning the keyword but taking it in as parameter and checking that it is there as expected (so it eats the characters from the input but doesn't need to store and return them).
When I currently write a parser I create a generic getToken() function (and I camel case all of my variable and function names :-/) and it returns the next token (either a keyword, symbol, variable) and then it gets parsed it in context. The CL prototype was my first crack at writing an interpreter and I recall I had a lot of fun doing this, especially on the QL while I was awaiting to save up the almost $2000+ to by my first color Macintosh (yup, they were pricey then). With my old code, I really only don't like the fact that I have both getcmd() and getkey(), both kind of doing the same thing, since a command is a keyword and vice versa, but otherwise I'm ok with it since it does optimize it a bit. What really should happen is that a getToken() function ought to grab each token, then actually tokenize it into a numeric value (lookup table), and finally we do a integer comparison when we parse and not a string equality (which is costly) -- at least for a recursive descent parser such as this. If you use a tool such as lex/yacc then you parse and execute separately and build and internal parse tree (also pretty fun if you've never done it).
Of course I never even wrote the context-free grammar for my language. That's the first thing one ought to do before writing the parser. I guess I kind of knew what I wanted, a sort of cross between ZX81 BASIC and SuperBASIC. I think for fun I'll try and create the grammar to see what it looks like because it will be easier to determine how to change the parser if a grammar is associated with it (esp since I may want to convert it to pure ZX81 BASIC for creating a simulator). I usually create a BNF formatted one which has odd syntax since it uses "<" and ">" for its non-terminals (confusing in the post mid-90's web word where that syntax was usurped by HTML tags -- BNF came from the 60's). My grammar starts out something like:
<CL-BASIC> ::= <code>
<code> ::= <statement> | <statement> <code>
<statement> ::= <linenum> <basic-stmt>
<statement> ::= <shell-stmt>
...
Might require a little thought with the difference of line numbered vs non-line numbered stuff since BASIC can also occur without line numbers and commands can have them.
When I currently write a parser I create a generic getToken() function (and I camel case all of my variable and function names :-/) and it returns the next token (either a keyword, symbol, variable) and then it gets parsed it in context. The CL prototype was my first crack at writing an interpreter and I recall I had a lot of fun doing this, especially on the QL while I was awaiting to save up the almost $2000+ to by my first color Macintosh (yup, they were pricey then). With my old code, I really only don't like the fact that I have both getcmd() and getkey(), both kind of doing the same thing, since a command is a keyword and vice versa, but otherwise I'm ok with it since it does optimize it a bit. What really should happen is that a getToken() function ought to grab each token, then actually tokenize it into a numeric value (lookup table), and finally we do a integer comparison when we parse and not a string equality (which is costly) -- at least for a recursive descent parser such as this. If you use a tool such as lex/yacc then you parse and execute separately and build and internal parse tree (also pretty fun if you've never done it).
Of course I never even wrote the context-free grammar for my language. That's the first thing one ought to do before writing the parser. I guess I kind of knew what I wanted, a sort of cross between ZX81 BASIC and SuperBASIC. I think for fun I'll try and create the grammar to see what it looks like because it will be easier to determine how to change the parser if a grammar is associated with it (esp since I may want to convert it to pure ZX81 BASIC for creating a simulator). I usually create a BNF formatted one which has odd syntax since it uses "<" and ">" for its non-terminals (confusing in the post mid-90's web word where that syntax was usurped by HTML tags -- BNF came from the 60's). My grammar starts out something like:
<CL-BASIC> ::= <code>
<code> ::= <statement> | <statement> <code>
<statement> ::= <linenum> <basic-stmt>
<statement> ::= <shell-stmt>
...
Might require a little thought with the difference of line numbered vs non-line numbered stuff since BASIC can also occur without line numbers and commands can have them.