Saturday, December 3, 2011

Indexed stack shufflers - part 2

Here's the explanation I promised. First we need an understanding of what we're talking about: a parser word (equivalent of Common Lisp's reader macros) that allows me to reference the stack by index. The indexing can be whatever you want, but I have written two as examples; the first and my favorite is the "Natural" indexing where the TOS (top-of-stack) is n, where n is the arity of the quotation being built. That's right, this means this indexing is 1-based:

2 8 [ 2 1 / ] call ! => 4

The 1 inside the quotation means element <1> on the stack, and 2 means element <2>, i.e. the TOS - because the quotation is of arity 2. Maybe this will help:

foo <1> <2> <3> [ quotation of arity 3 ]
foo bar <1> <2> [ quotation of arity 2 ]
foo bar quux <1> [ quotation of arity 1 ]

The numbers above stand for which element on the stack those numbers will refer to inside the quotation. foo, bar and quux are other stack objects that aren't used by the quotation.

The second indexing I wrote was the one suggested on #concatenative, where TOS is indexed as 0 and the second object is 1, etc. I find that indexing hard to think with.

So the goal is to write a word like [ that allows the example line above to work. We'll create our own [ for each type of indexing: [n> for natural indexing (TOS = n, n = arity), and [0> for 0-based indexing (TOS = 0). My explanation will focus on [n> only, but it applies equally to [0> (the difference is factored out).

Basically, the effect we want to achieve is to turn [n> 2 1 ] into [| 1 2 | 2 1 ] .
Not only is this an accurate explanation of the effect that the [n> I wrote achieves (though it's not what it does), but that code is actually runnable:

2 9 [| 1 2 | 2 1 ] call 2array . ! => { 9 2 }

This is handy for me since I need to explain why this works before I explain [n> . It also means that we'll be looking at the definition of [| for inspiration (code thievery).

[| is a parser word - it takes control of the lexer. Parser words are defined with the SYNTAX: word and they can choose at what level they want to read ahead - the lowest level being one string and the highest being a sequence of parsed objects. In-between there's a way to read a sequence of strings, possibly modify them, and only then parse them. That's what [| does, and those strings are then called "raw tokens" because they haven't yet been parsed. As a way to make this clearer, this is how each of these 3 levels would see some piece of code:

! reading 2 dup "blah"

"2 dup \"blah\""
Lowest level - I used this to write HERELINE:
This is so low that the only built-in ways to read ahead are by char (column), by line or the whole source code. There's no nice way to say "oh, only read until you see a matching ]" - you'd have to read the whole source and balance the brackets yourself. Also, after you're done don't forget to move the lexer head to where you read the last char. So yes, this is as low as it gets.

{ "2" "dup" "\"blah\"" } - sequence of strings (raw tokens)
Intermediary level - [| uses this and we have to as well because we need the names of the local variables to setup some things so that when they're parsed, they won't throw an error saying those words can't be found.
example:
SYNTAX: %[ "]" parse-tokens suffix! ;
%[ 2 dup "blah" dup ] ! => { "2" "dup" "\"" "blah\"" }
(char 34 (the " character) is a special word (it can touch the next), that's why the above looks weird)

{ 2 dup "blah" }
Highest level - everything has been parsed and if a word doesn't exist, an error will be thrown.
example:
SYNTAX: %[ \ ] parse-until suffix! ;
%[ 2 dup "blah" ] ! => V{ 2 dup "blah" }
%[ 2 dupe "blah" ] ! => error: "dupe" is not a defined word

Again, we'll be using the middle level, which is what [| uses.

This is the [| does, in recipe form, on the line "[| 1 2 | 2 1 ]":

1 - read tokens until "|" is found => { "1" "2" }

2 - create a word for each token, whose role is to fetch the respective position on the retain stack (this is a "local")

3 - ah, but those words are created in a new namespace

4 - now take this namespace and make it be the first in line for when we lookup words

5 - now go ahead and PARSE until \ ] (you see the trick? now it can parse instead of tokenize since it read "1" and "2" and turned them into the words \ 1 and \ 2 which can now be found as words instead of numbers, since the namespace we created with the definition for \ 1 and \ 2 shadow even the fact that they're numbers (!). This is the time I realize I should be writing these examples with "a" and "b" so that it would be clearer now why it can't throw a "no word found" for \ a - the reason being that we would have just created an \ a word from "a" in a new namespace and have placed that new namespace as the first one to be looked up when parsing. But if I had used "a" and "b" I'd have no reason to be explaining all this. So there.)

6 - we're done! (handwavingly so)

Ok, now you understand how [| works. So why didn't I write my [n> word to simply output the equivalent [| code? Because I have no clue how to write a SYNTAX: word that builds code from other SYNTAX: words. So I took what I believe (hope?) was the long way around, but I learned some useful stuff on the way so it was a good thing.

To be continued...

No comments:

Post a Comment