Sunday, December 4, 2011

Indexed stack shufflers - part 3

Now that I've covered some background of how parsing works in general and superficially covered locals, I can explain how the above stack shuffler works. Here's a more cleaned-up version of the code followed by a numbered explanation of what it does (the numbers on the list refer to the numbers in the code, of course):

! 3 -> { "arg-1" "arg-2" "arg-3" }
: >params-n ( n -- vec )
1 + iota unclip drop [ number>string "arg-" prepend ] map ;

GENERIC: >number ( str -- n )
M: string >number string>number ;

: str-is-int? ( str -- ? ) >number integer? ;

SYNTAX: [n>
[let 0 :> largest!
"]" [ dup str-is-int? ! 3. (below)
[ dup string>number largest max largest! "arg-" prepend ]
[ ] if ! 2.
] map-tokens ! 1.
largest ! -- tokens: { "arg-2" "arg-1" "/" } 2
] >params-n ! 4. -- tokens { "arg-1" "arg-2" }
'[ [ _ [ make-local ] map ] H{ } make-assoc ] ! 5. -- tokens [ quot ]
call rot ! 5. -- vars: { arg-1 arg-2 }
! assoc: H{ { "arg-1" arg-1 } { "arg-2" arg-2 } }
! tokens
'[ _ [ parse-datum ] map >quotation ] ! 6. -- vars assoc [ quot ]
((parse-lambda)) ! 6. -- vars [ arg-2 arg-2 / ]
<lambda> ! 7. -- [| arg-1 arg-2 | arg-2 arg-1 ]
?rewrite-closures append! ; ! 8.

Target: [n> 2 1 / ]

  1. Read all tokens until "]"

  2. For each token, if it's an integer n then covert it to "arg-${n}"

  3. While doing step 2, also keep track of the largest number seen

  4. Generate a list of n "arg-${n}" strings where n is the largest number seen

  5. Create a new blank namespace (an assoc) and fill it with words named as the "arg-n" strings

  6. Parse the tokens:

    1. Push assoc to manifest (it becomes the first namespace in the lookup path)
    2. Parse the tokens
    3. Convert the list of parsed tokens into a quotation

  7. Wrap variables and quotation into a lambda tuple

  8. Rewrite lambda so that every reference to a variable becomes an interaction with the retain stack; add the resulting quotation to the parse tree.

Here's an example usage:
2 9 [n> 2 1 / ] call . ! => 4+1/2
Hooray!

Now here is a way to break it:
2 9 [n> 2 1 [| a b | b a / ] ] call . ! => should return 2/9
Awww!

The reason why the line above breaks our word is because we're parsing the tokens too naively. The specific reason it breaks is because it matches on the first "]" which is not its own. There are several problems with our approach, all stemming from the fact that we need to read ahead to know the largest number before we can actually begin parsing anything, but reading raw tokens isn't adequate since we stop at the first "]" and not the matching \ ] .

What we'd need to do to fix this is to actually parse until \ ] (the word parse-until also executes all parsing words it comes across, so we'd have one less thing to worry about), then find the largest number, then reset the lexer head to where we started, then parse again - but then when would we be able to change the numbers into arg-1 arg-2 etc? And even worse, this would execute parsing words twice, which can be a bad thing. Even if this worked, it would only work with numbers because they always parse, but what if we wanted [x> y x / ] ? We'd have no way to write that because the first parse would fail.

No, we need a better way. Parsing until \ ] won't do because of the reasons above, so we need to go to a lower level. Maybe we have to go down to reading tokens, but the inherent problem with that is we always stop at the first "]". We need to go to a lower level still.

But I'm not convinced this problem is a low-level one. Let's think again from another perspective. What would be the ideal scenario? Well, I'd really like to use parse-until since it executes the parsing words for me, and thus handles the balancing of the brackets automatically. Why can't I use it, again? Ah, because it parses tokens and throws an error if a word is not found; and because there's no room to turn numbers into arg-n words (or to otherwise turn non-words into words).

(a couple of hours later)

I'm starting to think this is impossible for the general case, and very time-consuming for the static case where we know the exact syntax of all parsing words that introduce and shadow local variables. The easy way around is to add a restriction by forward declaration:
2 9 [2> 2 1 / ] call . ! => would return 4+1/2 if implemented
I'll write the code for that later.

No comments:

Post a Comment