Friday, January 6, 2012

011 - Learning libraries

I got asked to show how I go about figuring out how libraries work, and to explain exactly how I make use of Factor resources (the documentation, the walker, the inspector, etc) to achieve that goal.

The example I was asked to use was to be able to use the FTP library to retrieve a file from a password-protected FTP server. Here we go.

N.B.: I use "to drill down" to mean something a bit hard to explain with words - it's the action of browsing the source code of a word in the Browser and progressively click each specified word to browse its source code in turn; this is interactive code spelunking.


  1. To brush up on FTP commands, I googled for: ftp unix tutorial password download. Turns out it's very simple.
  2. Switched right to the Factor docs, searched for FTP, clicked on ftp.client.
  3. Looked at the list of commands trying to try and make sense of it. A good way to make sense of libraries is to find the basic building blocks - the least abstract functions that provide the support for the higher level ones.
  4. Since the command to download a file is GET, I clicked on ftp-get and drilled down to both with-ftp-client and ftp-cwd.
  5. I saw that ftp-cwd calls ftp-command. This looks like the basic building block I was looking for. Its source confirms that this is very concrete (i.e. it does something I can promptly understand - namely, send a command and read the response).
  6. I'm done looking at ftp-get. I want to find something that has to do with passwords.
  7. ftp-password doesn't take a password, only a url, so the url must have the password already. I try "ftp://user:pwd@ftp.server.com/" >ftp-url inspect and verify that indeed the password is gotten from the URL itself.
  8. Tried "ftp://user:pwd@ftp.server.com/folder/file.ext" ftp-get - got an error saying it had trouble saving the file to disk - permission was denied. This makes sense since I never specified a local path to save the file to, but I see the error says it tried to save the file to my root folder. I suspect this is set in some dynamic variable, but I have no clue really. I need to find out where this function tries to save the files it GETs, or how it builds the local path.
  9. Drilled down ftp-get > (ftp-get) - saw it uses but otherwise it's a dead-end. I'll have to walk the code to see exactly what is fed to .
  10. Walked the code and found out it uses "file.ext" for the local-path. This doesn't look right, unless does more than meets the eye.
  11. Looked at the docs for , it doesn't mention anything helpful to me. I look at the source for it down below and click on normalize-path - aha! The docs say this function prepends current-directory to its argument. I did suspect this was going to be implicitly handled by a dynamic variable.
  12. Tried "/Users/user1/Desktop/" current-directory [ "ftp://user:pwd@ftp.server.com/folder/file.ext" ] with-variable and it worked fine.


I hope this helps.

Friday, December 9, 2011

99 bottles

Here's my version of this classic:
USING: combinators io kernel math
math.parser quotations sequences ;
IN: bottles

: bottle(s) ( n -- s ) 1 = "bottle " " bottles " ? ;

: (line) ( bottle(s) -- string )
dup {
{ 1 [ bottle(s) "1 " ] }
{ 0 [ bottle(s) "No more" ] }
[ bottle(s) swap number>string ]
} case swap "of beer" 3append ;

: (line-1) ( bottle(s) -- str )
(line) " on the wall" append ;

: line-1 ( bottle(s) -- )
(line-1) "," append print ;

: line-2 ( bottle(s) -- )
(line) "." append print ;

: line-3 ( bottle(s) -- )
0 = "Go to the store, buy some more,"
"Take one down, pass it around," ? print ;

: line-4 ( bottle(s) -- )
1 - dup 0 < [ drop 99 ] when
(line-1) "!\n" append print ;

: sing-lines ( x seq -- )
[ 1quotation ] map [ call( x -- ) ] with each ;

: sing-verse ( bottle(s) -- )
{ line-1 line-2 line-3 line-4 } sing-lines ;

: sing-song ( -- )
100 iota <reversed> [ sing-verse ] each ;

MAIN: sing-song

Thursday, December 8, 2011

Step-debugging parsing words

With all the macro play posts, I would have gone insane by now if I had had been juggling the stack in my head.

So I wrote the equivalent of break but for parsing words. The reason why simply using break (or its always-available alias B) doesn't work for parsing words is because the walker expands them before you have a chance to step into them.
So here's B's brother, B: :
USING: definitions kernel parser tools.continuations ;
IN: syntax-break

SYNTAX: B: scan-word definition
[ break "now press O I" drop ]
prepose call( accum -- accum ) ;

Its definition is simple: it scans the next word, gets its definition (aka source code) as a quotation, prepends break to it, and then calls it. That's exactly what I was doing manually before I wrote this. :P

As a bonus and to be self-explanatory, I added a little "live" documentation to it (they're instructions, actually) so that I don't forget how to use it.

Oh, and I have to use call( instead of call because the quotation I'm calling isn't know at compile time.

Try it out:
IN: scratchpad "world" "hello" B: [| w h | h ", " w 3append ] call .

And then follow the instructions (hint: you'll land inside B:'s definition still, right at the "now press O I" drop part. Guess what happens when you press O and then I.)

Indexed stack shufflers - part 4 (final)

Let's finish this chapter at once.
Here's a less flippant implementation of the stack shufflers we've been discussing.
USING: fry locals.parser locals.types math.parser math.ranges
multiline namespaces sequences ;
IN: shortlocals

: >n-locals ( n -- seq ) [1,b] [ number>string ] map ;

: define-n-locals ( accum n -- accum )
>n-locals '[ _ [ make-local ] map ] H{ } make-assoc ! words hash
(parse-lambda) <lambda> ?rewrite-closures append! ;

SYNTAX: [2> 2 define-n-locals ;
SYNTAX: [3> 3 define-n-locals ;

And here's how it works (notice how it plays nice with other local-introducing parse-words):
IN: scratchpad 2 9 [2> 2 1 / ] call .
4+1/2
IN: scratchpad 2 9 [2> 2 1 / 3 + ] call .
7+1/2
IN: scratchpad 2 9 100 [3> 2 1 / 3 + ] call .
104+1/2
IN: scratchpad 2 9 100 [3> 3 [| a | a 1 / ] call 2 1 3array ] call .
{ 50 9 2 }

define-n-locals is defined exactly as [| is, except we skip the first step inside parse-lambda (i.e. we skip parse-local-defs and instead simply provide the sequence of strings to build locals from.

I also consulted with the Factor Gods and they told me it's impossible to write a version of [2> that simply adds " [| 1 2 | " to the code (which would be sort of the equivalent of C-style "macros", I think).

To which I replied that it's impossible except for one very dirty way of doing it. I'll save that for a future post, when I'm feeling flippant again.

Sunday, December 4, 2011

Indexed stack shufflers - part 3

Now that I've covered some background of how parsing works in general and superficially covered locals, I can explain how the above stack shuffler works. Here's a more cleaned-up version of the code followed by a numbered explanation of what it does (the numbers on the list refer to the numbers in the code, of course):

! 3 -> { "arg-1" "arg-2" "arg-3" }
: >params-n ( n -- vec )
1 + iota unclip drop [ number>string "arg-" prepend ] map ;

GENERIC: >number ( str -- n )
M: string >number string>number ;

: str-is-int? ( str -- ? ) >number integer? ;

SYNTAX: [n>
[let 0 :> largest!
"]" [ dup str-is-int? ! 3. (below)
[ dup string>number largest max largest! "arg-" prepend ]
[ ] if ! 2.
] map-tokens ! 1.
largest ! -- tokens: { "arg-2" "arg-1" "/" } 2
] >params-n ! 4. -- tokens { "arg-1" "arg-2" }
'[ [ _ [ make-local ] map ] H{ } make-assoc ] ! 5. -- tokens [ quot ]
call rot ! 5. -- vars: { arg-1 arg-2 }
! assoc: H{ { "arg-1" arg-1 } { "arg-2" arg-2 } }
! tokens
'[ _ [ parse-datum ] map >quotation ] ! 6. -- vars assoc [ quot ]
((parse-lambda)) ! 6. -- vars [ arg-2 arg-2 / ]
<lambda> ! 7. -- [| arg-1 arg-2 | arg-2 arg-1 ]
?rewrite-closures append! ; ! 8.

Target: [n> 2 1 / ]

  1. Read all tokens until "]"

  2. For each token, if it's an integer n then covert it to "arg-${n}"

  3. While doing step 2, also keep track of the largest number seen

  4. Generate a list of n "arg-${n}" strings where n is the largest number seen

  5. Create a new blank namespace (an assoc) and fill it with words named as the "arg-n" strings

  6. Parse the tokens:

    1. Push assoc to manifest (it becomes the first namespace in the lookup path)
    2. Parse the tokens
    3. Convert the list of parsed tokens into a quotation

  7. Wrap variables and quotation into a lambda tuple

  8. Rewrite lambda so that every reference to a variable becomes an interaction with the retain stack; add the resulting quotation to the parse tree.

Here's an example usage:
2 9 [n> 2 1 / ] call . ! => 4+1/2
Hooray!

Now here is a way to break it:
2 9 [n> 2 1 [| a b | b a / ] ] call . ! => should return 2/9
Awww!

The reason why the line above breaks our word is because we're parsing the tokens too naively. The specific reason it breaks is because it matches on the first "]" which is not its own. There are several problems with our approach, all stemming from the fact that we need to read ahead to know the largest number before we can actually begin parsing anything, but reading raw tokens isn't adequate since we stop at the first "]" and not the matching \ ] .

What we'd need to do to fix this is to actually parse until \ ] (the word parse-until also executes all parsing words it comes across, so we'd have one less thing to worry about), then find the largest number, then reset the lexer head to where we started, then parse again - but then when would we be able to change the numbers into arg-1 arg-2 etc? And even worse, this would execute parsing words twice, which can be a bad thing. Even if this worked, it would only work with numbers because they always parse, but what if we wanted [x> y x / ] ? We'd have no way to write that because the first parse would fail.

No, we need a better way. Parsing until \ ] won't do because of the reasons above, so we need to go to a lower level. Maybe we have to go down to reading tokens, but the inherent problem with that is we always stop at the first "]". We need to go to a lower level still.

But I'm not convinced this problem is a low-level one. Let's think again from another perspective. What would be the ideal scenario? Well, I'd really like to use parse-until since it executes the parsing words for me, and thus handles the balancing of the brackets automatically. Why can't I use it, again? Ah, because it parses tokens and throws an error if a word is not found; and because there's no room to turn numbers into arg-n words (or to otherwise turn non-words into words).

(a couple of hours later)

I'm starting to think this is impossible for the general case, and very time-consuming for the static case where we know the exact syntax of all parsing words that introduce and shadow local variables. The easy way around is to add a restriction by forward declaration:
2 9 [2> 2 1 / ] call . ! => would return 4+1/2 if implemented
I'll write the code for that later.

Saturday, December 3, 2011

Indexed stack shufflers - part 2

Here's the explanation I promised. First we need an understanding of what we're talking about: a parser word (equivalent of Common Lisp's reader macros) that allows me to reference the stack by index. The indexing can be whatever you want, but I have written two as examples; the first and my favorite is the "Natural" indexing where the TOS (top-of-stack) is n, where n is the arity of the quotation being built. That's right, this means this indexing is 1-based:

2 8 [ 2 1 / ] call ! => 4

The 1 inside the quotation means element <1> on the stack, and 2 means element <2>, i.e. the TOS - because the quotation is of arity 2. Maybe this will help:

foo <1> <2> <3> [ quotation of arity 3 ]
foo bar <1> <2> [ quotation of arity 2 ]
foo bar quux <1> [ quotation of arity 1 ]

The numbers above stand for which element on the stack those numbers will refer to inside the quotation. foo, bar and quux are other stack objects that aren't used by the quotation.

The second indexing I wrote was the one suggested on #concatenative, where TOS is indexed as 0 and the second object is 1, etc. I find that indexing hard to think with.

So the goal is to write a word like [ that allows the example line above to work. We'll create our own [ for each type of indexing: [n> for natural indexing (TOS = n, n = arity), and [0> for 0-based indexing (TOS = 0). My explanation will focus on [n> only, but it applies equally to [0> (the difference is factored out).

Basically, the effect we want to achieve is to turn [n> 2 1 ] into [| 1 2 | 2 1 ] .
Not only is this an accurate explanation of the effect that the [n> I wrote achieves (though it's not what it does), but that code is actually runnable:

2 9 [| 1 2 | 2 1 ] call 2array . ! => { 9 2 }

This is handy for me since I need to explain why this works before I explain [n> . It also means that we'll be looking at the definition of [| for inspiration (code thievery).

[| is a parser word - it takes control of the lexer. Parser words are defined with the SYNTAX: word and they can choose at what level they want to read ahead - the lowest level being one string and the highest being a sequence of parsed objects. In-between there's a way to read a sequence of strings, possibly modify them, and only then parse them. That's what [| does, and those strings are then called "raw tokens" because they haven't yet been parsed. As a way to make this clearer, this is how each of these 3 levels would see some piece of code:

! reading 2 dup "blah"

"2 dup \"blah\""
Lowest level - I used this to write HERELINE:
This is so low that the only built-in ways to read ahead are by char (column), by line or the whole source code. There's no nice way to say "oh, only read until you see a matching ]" - you'd have to read the whole source and balance the brackets yourself. Also, after you're done don't forget to move the lexer head to where you read the last char. So yes, this is as low as it gets.

{ "2" "dup" "\"blah\"" } - sequence of strings (raw tokens)
Intermediary level - [| uses this and we have to as well because we need the names of the local variables to setup some things so that when they're parsed, they won't throw an error saying those words can't be found.
example:
SYNTAX: %[ "]" parse-tokens suffix! ;
%[ 2 dup "blah" dup ] ! => { "2" "dup" "\"" "blah\"" }
(char 34 (the " character) is a special word (it can touch the next), that's why the above looks weird)

{ 2 dup "blah" }
Highest level - everything has been parsed and if a word doesn't exist, an error will be thrown.
example:
SYNTAX: %[ \ ] parse-until suffix! ;
%[ 2 dup "blah" ] ! => V{ 2 dup "blah" }
%[ 2 dupe "blah" ] ! => error: "dupe" is not a defined word

Again, we'll be using the middle level, which is what [| uses.

This is the [| does, in recipe form, on the line "[| 1 2 | 2 1 ]":

1 - read tokens until "|" is found => { "1" "2" }

2 - create a word for each token, whose role is to fetch the respective position on the retain stack (this is a "local")

3 - ah, but those words are created in a new namespace

4 - now take this namespace and make it be the first in line for when we lookup words

5 - now go ahead and PARSE until \ ] (you see the trick? now it can parse instead of tokenize since it read "1" and "2" and turned them into the words \ 1 and \ 2 which can now be found as words instead of numbers, since the namespace we created with the definition for \ 1 and \ 2 shadow even the fact that they're numbers (!). This is the time I realize I should be writing these examples with "a" and "b" so that it would be clearer now why it can't throw a "no word found" for \ a - the reason being that we would have just created an \ a word from "a" in a new namespace and have placed that new namespace as the first one to be looked up when parsing. But if I had used "a" and "b" I'd have no reason to be explaining all this. So there.)

6 - we're done! (handwavingly so)

Ok, now you understand how [| works. So why didn't I write my [n> word to simply output the equivalent [| code? Because I have no clue how to write a SYNTAX: word that builds code from other SYNTAX: words. So I took what I believe (hope?) was the long way around, but I learned some useful stuff on the way so it was a good thing.

To be continued...

Thursday, December 1, 2011

Indexed stack shufflers

Some 12 hours ago a guest on #concatenative shared this idea for a new syntax for stack shuffling:
"world" ", " "hello" "!" <1 2 3 0> 4append ! => "hello, world!"

I decided that would be a great challenge and an opportunity to level up my macro skills. Here it is:
USING: fry grouping kernel lexer locals locals.parser
locals.types make math math.order math.parser multiline
namespaces parser quotations sequences vocabs.parser words ;
IN: shortlocals

! 4 -> { "arg-3" "arg-2" "arg-1" "arg-0" }
: >params-0 ( n -- vec )
1 + iota reverse [ number>string "arg-" prepend ] map ;

! 3 -> { "arg-1" "arg-2" "arg-3" }
: >params-n ( n -- vec )
1 + iota unclip drop [ number>string "arg-" prepend ] map ;

! like \ supremum but for seqs with non-numbers intermixed
: max-num ( seq -- n )
[ number? ] filter supremum ;

SYNTAX: [0>
[let 0 :> largest!
"]"
[
! = modded parse-datum
dup string>number [ dup string>number largest max largest! "arg-" prepend ] [ ] if
] map-tokens
largest
] >params-0 ! -- words params
'[ [ _ [ make-local ] map ] H{ } make-assoc ] dip
'[ _ [ dup search [ ] [ dup string>number [ ] [ no-word ] ?if ] ?if ] map >quotation ]
((parse-lambda))
<lambda> ?rewrite-closures append! ;
! final version of 0[

SYNTAX: [n>
[let 0 :> largest!
"]"
[
! = modded parse-datum
dup string>number [ dup string>number largest max largest! "arg-" prepend ] [ ] if
] map-tokens
largest
] >params-n ! -- words params
'[ [ _ [ make-local ] map ] H{ } make-assoc ] dip
'[ _ [ dup search [ ] [ dup string>number [ ] [ no-word ] ?if ] ?if ] map >quotation ]
((parse-lambda))
<lambda> ?rewrite-closures append! ;



! top-of-stack is 0, below that is 1, and so forth
"world" ", " "hello" "!" [0> 1 2 3 0 3append append ] call . ! => "hello, world!"

! natural order: "world" is 1, the comma is 2, "hello" is 3 and "!" is 4:
"world" ", " "hello" "!" [n> 3 2 1 4 3append append ] call . ! => "hello, world!"

It's a bit of a mess still as I haven't refactored it yet, and as there's some code that I copied from other words' definitions instead of calling the word (because I was changing some bits but ended up mostly leaving the way it was and not undoing the mess).
I'll explain the code and what I learned from the experience tomorrow.