Thursday, January 29, 2009

1x Forth

http://www.ultratechnology.com/1xforth.htm

1x Forth
Charles Moore
April 13, 1999


I asked Chuck Moore to let me video tape him commenting on the evolution of his language over the last fifteen years. We setup a camcorder and he talked about many of the ideas that I have quoted him on at this website, but this time with a focus on Forth rather than solid state physics, or VLSI cad, or his chip designing and debugging experiences. I have felt that he has talked about the Forth language in his speeches over the years but has covered so many other topics that I wanted to get his comments on the language that he invented more than thirty years ago.

The entire 54 minute video is available in the UltraTechnology store.

I introduce Chuck.

(Jeff Fox) I've asked Chuck Moore to give a presentation today on Forth I've asked him to talk about his experience with his language for the last fifteen years. So if I can introduce him, Charles Moore.



(Charles Moore) That's quite a broad topic. How long are we talking?

(Jeff) The tape is one hour.

(Chuck) Fifteen years; that just about covers my experience with computers as opposed to software. Going back fifteen years the motivation for switching the emphasis from Forth the language to Forth the microprocessor was twofold.

First the software problems were solved. It was easy to write applications, trivial to write applications. All the problems lay in the hardware. Hardware was awkward and messy and unreliable especially if you were dealing with a custom system that someone had built for which they wanted custom software. It became a real drag to try to debug the hardware for them. And it became clear that the hardware engineers weren't doing a very good job. Better than the software engineers in the industry but not as good as the software engineers in Forth. So I thought I would see what I could do to address the hardware problem. That might have been a mistake. Forth is a lot of fun to work with, hardware is not so much fun.

I don't know if you are all aware of the history but the first Forth processor that I did, it might have the first one of all, was Novix. Which was a 16-bit and state of the art as far as speed is concerned, meaning 8mips. It was a lot of fun to work with. We had a very nice Forth on it, cmForth. Much smaller and simpler than the other Forths I had been doing for Forth Inc. And then of course ShBoom, which was 32-bits and 50mips. And now the i21 is the latest incarnation which is 20-bits and I like to claim 500mips.

Each of these had its own kind of Forth associated with it. The goal was very simple: to minimize the complexity of the hardware software combination. As far as I can see no-one else is doing that. Some lip service perhaps, but no-one is trying to minimize the complexity of anything and that is a great concern to me.

We are building a culture which can not survive as trivial an incident as Y2K. Once we lose the billion dollar fabrication plants and once we lose the thousand man programming teams how do we rebuild them? Would be bother to rebuild them? Are computers worth enough to demand the social investment that we put into them. It could be a lot simpler. If it were a lot simpler I would have a lot more confidence that the technology would endure into the indefinite future.

There seems to be a propensity to make things complicated. People love to make them complicated. You can see that in television programming you can see it in products in the marketplace, you can see it in internet web sites. Simply presenting the information is not enough, you have got to make it engaging. I think there is perhaps an optimal level of complexity that the brain is designed to handle. If you make it to simple people are bored if you make it too complicated they are lost

I'm never bored by simplicity. Show me a simpler way to do anything that I'm doing. I will jump on it. But that doesn't seem to be the case Automobiles, airplanes, spacecraft, immensely complicated devices. Forth doesn't need to be complicated. Classic Forth started out simple, it gradually accreted layers of complexity. At Forth Inc. that kind of became the company culture. We had this package and we were selling it and we were exploiting it and we were stuck with it. When I left Forth Inc. I had a chance to simplify and cmForth was the result. Unfortunately cmForth didn't get used for a lot of applications. So it is hard to tell how well it would have worked out.

The Forth I did on ShBoom was as I recall pretty much like cmForth. The biggest application that I did was to make video work and to start the process of the custom silicon design package so I know the approach worked well with ShBoom.

i21, the Forth I am using there is color Forth and I haven't done any significant applications on it yet. It is brutally simple. It is simpler than any of it's predecessors and I can say something about it and why.

The i21 itself is a simple processor. Not as simple as it could be. Because it seemed to me that'll the only hope we had in selling a processor was to make it fast. So I have added a lot of complexity in pursuit of performance. Hopefully I have found some kind of reasonable balance. Five hundreds mips is very nice speed. We can only do bursts at the moment but it will be sustained someday. The problem is that there are no applications that require that much speed or require a large number of processors running at that speed. It is a hard sell.

The most interesting application that I have seen recently is the SETI at home project, where you can download data from Arecebo and process it for a couple of weeks on your PC and send it back to participate in this exploration. So maybe a faster processor would sell there simply for that application. Maybe a SETI distributed processor system may be the thing to do.

Once you've got a processor, hopefully one well suited to Forth. Than after you have a processor what should Forth look like in order to exploit the processor? That raises the question of what is Forth? I have hoped for some time that someone would tell me what it was. I keep asking that question. What is Forth?

Forth is highly factored code. I don't know anything else to say except that Forth is definitions. If you have a lot of small definitions you are writing Forth. In order to write a lot of small definitions you have to have a stack. Stacks are not popular. Its strange to me that they are not. There is a just lot of pressure from vested interests that don't like stacks, they like registers. Stacks are not a solve all problems concept but they are very very useful, especially for information hiding and you have to have two of them.

These ideas have been around for thirty years now. I have been promoting them for thirty years. Their level of acceptance is about where it was thirty years ago. Perhaps even less considering that the industry has expanded so much.

Forth
Defintions
Stacks

So that is Forth. And this a requirement to support definitions.

What is a definition? Well classically a definition was colon something, and words, and end of definition somewhere.

: some ~~~ ;

I always tried to explain this in the sense of this is an abbreviation, whatever this string of words you have here that you use frequently you have here you give it a name and you can use it more conveniently. But its not exactly an abbreviation because it can have a parameter perhaps or two. And that is a problem with programmers, perhaps a problem with all programmers; too many input parameters to a routine. Look at some 'C' programs and it gets ludicrous. Everything in the program is passed through the calling sequence and that is dumb.

A Forth word should not have more than one or two arguments. This stack which people have so much trouble manipulating should never be more than three or four deep.

Our current incarnation of our word (:) is to make it red. That way you don't even use colon. This not only reduces the amount of text you have to store in your source file but it vastly clarifies what is going on. The red word is being defined,

some ~~~

the definition is green and it might have a semicolon in the definition which means return but it does not mean end of definition. It can have more than one return, and you can have more than one entry point in here if you want. Without this semicolon this definition would fall through into this definition and return at this point but still there is no state of your in compile mode versus execute mode. Your either running green or your running white background, black. Black means execute, green means compile, red means define.

This to me is simpler and more clear. It is brand new so it hasn't gotten any acceptance but we will see.

But as to stack parameters, the stacks should be shallow. On the i21 we have an on-chip stack 18 deep. This size was chosen as a number effectively infinite.

The words that manipulate that stack are DUP, DROP and OVER period. There's no ..., well SWAP is very convenient and you want it, but it isn't a machine instruction. But no PICK no ROLL, none of the complex operators to let you index down into the stack. This is the only part of the stack, these first two elements, that you have any business worrying about. Of course on a chip those are the two inputs to the ALU so those are what are relevant to the hardware.

The others are on the stack because you put them there and you are going to use them later after the stack falls back to their position. They are not there because your using them now. You don't want too many of those things on the stack because you are going to forget what they are.

So people who draw stack diagrams or pictures of things on the stack should immediately realize that they are doing something wrong. Even the little parameter pictures that are so popular. You know if you are defining a word and then you put in a comment showing what the stack effects are
and it indicates F and x and y

F ( x - y )

I used to appreciate this back in the days when I let my stacks get too complicated, but no more. We don't need this kind of information. It should be obvious from the source code or be documented somewhere else.

So the stack operations that I use are very limited. Likewise the conditionals. In Classic Forth we used

IF ELSE THEN

And I have eliminated ELSE.

I don't see that ELSE is as useful as the complexity it introduces would justify. You can see this in my code. I will have IF with a semicolon and then I will exit the definition at that point or continue.

IF ~~~ ; THEN

I have the two way branch but using the new feature of a semicolon which does not end a definition. Likewise with loops, there were a lot of loop constructs. The ones I originally used were taken out of existing languages. I guess that is the way things evolve.

There was
DO LOOP there was
FOR NEXT and there was
BEGIN UNTIL

DO LOOP was from FORTRAN, FOR NEXT was from BASIC, BEGIN UNTIL was from ALGOL.

What one do we pick for Forth? This (DO LOOP) has two loop control parameters and it is just too complicated. This (FOR NEXT) has one loop control parameter and is good with a hardware implementation and is simple enough to have a hardware implementation. And this one (BEGIN) has variable number of parameters. Unfortunately.. (noise)

We are borrowing this recording facility from iTV and it is their vault. If you hear an echo in the sound system that's why, it's a vault.

I've got a new looping construct that I am using in Color Forth and that I find superior to all the others. That is that if I have a WORD I can have in here some kind of a conditional with a reference to WORD. And this is my loop.

WORD ~~~ IF ~~~ WORD ;
THEN ~~~ ;

I loop back to the beginning of the current definition. And that is the only construct that I have at the moment and it seems to be adequate and convenient. It has a couple of side effects. One is that it requires a recursive version of Forth. This word must refer to the current definition and not some previous definition. This eliminates the need for the SMUDGE/UNSMUDGE concept which ANS is talking about giving a new name. But the net result is that it is simpler.

It would of course not be convenient to nest loops but nested loops are a very dicey concept anyway. You may as well have nested definitions. We've talked over the last fifteen years about such things. Should you have conditional execution of a word or should you have something like IF THEN? Here is an example where I think it pays well in clarity, the only loop you have to repeat the current word.

WORD ~~~ IF ~~~ WORD ;
THEN ~~~ ;

You can always do that and it leads to more intense factoring. And that is in my mind one of the keystones of Forth, you factor and you factor and you factor until most of your definitions are one or two lines long.

(Jeff) You might point out that your semicolon after WORD results in tail recursion and converting the call in WORD to a jump and that is how it functions.

(Chuck) So there is no reason to make that a call since you are never going to go anywhere afterwards so you just make that jump. In fact in all my latest Forths semicolon kind of meant either return or jump depending on the context and it's optimized in the compiler to do that. It's a very simple look back optimization that actually saves a very important resource, the return stack.

On i21 the return stack is only 17 deep. People who are used to nesting indefinitely might get into trouble here. You shouldn't nest too deeply. It makes programs impossible to follow. You can have spaghetti code with calls just a you can with GOTOs. You have got to keep it simple.

As I recall those are the major changes I have made to my current Forth. Maybe with the exception of BLOCK. BLOCK is a wonderful word BLOCK accesses, used to access a region of disk. Now I define it as accessing region of memory. There is no reason to use the disk at all. With megabytes of memory available you just load your data into memory and go from there. There is no need for disk. So BLOCK becomes much much simpler. Basically the definition of BLOCK is a thousand times.

BLOCK 1024 * ;

That gives you access to a block of memory that is a 1024 bytes wide. The value of BLOCK is that it partitions your memory for you. It factors your memory into manageable pieces. You can talk about a megabyte of memory or a thousand blocks of memory

I encountered a NASA webpage just today which is amusing. I think it was using 110,000 kilometers per hour as the speed of the spacecraft, the Stardust spacecraft. Conveniently, since I don't really have a feel for that number they converted it to 69,000 miles per hour for the convenience of people who don't think in those numbers. Those numbers are too large to have any meaning to anyone. I think it converts to so many kilometers per second or some nice small number that I have a feel for. It doesn't do any good to let the number get big when we are dealing with one hundred and twenty megabytes of memory the numbers get big, just begging to let the numbers get big. It is impressive, but it isn't useful. So BLOCKs help restore the scale.

Well one thing is to say is that Forth is what Forth programmers do. I would like to think of it as what Forth programmers ought to do. Because I have found that teaching someone Forth does not mean that he is going to be a good Forth programmer. There is something more than the formalism and syntax of Forth that has got to be embedded in your brain before you're going to be effective at what you do.

My contention is that every application that I have seen that I didn't code has ten times as much code in it as it needs. And I see Forth programmers writing applications with ten times as much code as is necessary.

The concern that I have, the problem that I have been pondering for the last few years is: How can I pursuade these people to write good Forth? How can I pursuade them that it's possible to write good Forth? Why would anyone want to write ten times as much as they would need to write?

Microsoft does this, I'm sure you're all aware, but they almost have an excuse for doing it because they are trying to be compatible with everything they have ever done in the past. If it impossible for you to start with a clean piece of paper then you will have to write more code. But ten times a much code? That seems excessive.

How big should a program be? For instance, how large should the TCP/IP stack be? I don't know. I couldn't know without sitting down and writing the code for it. But I should not be very big, a kiloword.

The i21 has four instructions per word. The Pentium has one instruction per two bytes. It is very hard to judge, you should talk in instructions instead of the size of memory in which the instructions reside.

About a thousand instructions seems about right to me to do about anything. To paraphrase the old legend that any program with a thousand instructions can be written in one less. All programs should be a thousand instructions long.

How do you get there? What is the magic? How can you make applications small? Well you can do several things that are prudent to do in any case and in any language.

No Hooks

One is No Hooks. Don't leave openings in which you are going to insert code at some future date when the problem changes because inevitably the problem will change in a way that you didn't anticipate. Whatever the cost it's wasted. Don't anticipate, solve the problem you've got.

Don't Complexify

Simplify the problem you've got or rather don't complexify it. I've done it myself, it's fun to do. You have a boring problem and hiding behind it is a much more interesting problem. So you code the more interesting problem and the one you've got is a subset of it and it falls out trivial. But of course you wrote ten times as much code as you needed to solve the problem that you actually had.

Ten times code means ten times cost; the cost of writing it, the cost of documenting it, it the cost of storing it in memory, the cost of storing it on disk, the cost of compiling it, the cost of loading it, everything you do will be ten times as expensive as it needed to be. Actually worse than that because complexity increases exponentially.

10x Code
10x Cost
10x Bugs
10x Maintenance
Ten times the bugs! And ten times the difficulty of doing maintenance on the code as is amply illustrated by the Y2K bug. In fact it curious the fixes that I see people making the COBOL programs to fix the Y2K bug make the programs signifigantly more complex and larger and introduce spaghetti code that can't be maintained and is probably going to fail again in fifty years they are just using windows. They are not increasing the range of the date they are merely shifting it so that is going to lead to another problem when that window runs out.

This is why we are still running programs which are ten or twenty years old and why people can't afford to update, understand, and rewrite these programs because they are significantly more complex, ten times more complex than they should be.

So how do you avoid falling into this trap. How do you write one times programs?

One times, 1x That would make a good name for a web page.

You factor. You factor, you factor, you factor and you throw away everything that isn't being used, that isn't justified.

The whole point of Forth was that you didn't write programs in Forth you wrote vocabularies in Forth. When you devised an application you wrote a hundred words or so that discussed the application and you used those hundred words to write a one line definition to solve the application. It is not easy to find those hundred words, but they exist, they always exist.

Let me give you an example of an application in which not only can you reduce the amount of code required by 90% and here is a case where you can reduce the code by 100% and it is a topic that is dear to our hearts it's called FILES. If you have files in your application, in your Forth system then you have words like

OPEN
CLOSE
READ
WRITE
REWIND whatever

and they are arguably not going to be such short words, They are going to be words like OPEN-FILE because of all kinds of things that you want to be opening and closing like windows.

If you can realize that this is all unnecessary you save one hundred percent of the code that went into writing the file system. Files are not a big part of any typical application but it is a singularly useless part. Identify those aspects of what you are trying to do and saying we don't need to do that. We don't need checksums on top of checksums. We don't need encryption because we aren't transmitting anything that we don't need. You can eliminate all sorts of things.

Now that's the general solution to a problem that all the programmers in the world are out there inventing for you, the general solution, and nobody has the general problem.

I wish I knew what to tell you that would lead you to write good Forth. I can demonstrate. I have demonstrated in the past, ad nauseam, applications where I can reduce the amount of code by 90% percent and in some cases 99%. It can be done, but in a case by case basis. The general principle still eludes me.

(Jeff) I had a question about your screens in Color Forth. People have commented that you have a very large font and very little information on the screen. How much of that is because of your vision and how much is to limit the information your looking at at one time?

(Chuck) I am losing patience with small characters and very hard to read web pages. With my eyes the characters are blurred, with my reading glasses the characters are blurred. If I increase the character size, which sometimes I can do and sometimes I can't, I lose a lot of context. It is a problem. It is probably a problem for an increasing percentage of the population so I make the characters as big as I can, but Jeff's right, if you make them too big you lose information.

Now it is a classic rule of thumb for designing slides that you pick a frame and you put up some bullets and topics and you don't try to put too much information on a slide because you only confuse your audience and if you make the characters small they won't be able to read them either.

In the case of my Color Forth I think my characters are probably too large. I can get 256 on the screen at once. I get 20 x 14 or sometimes 24 x 15, it depends on which computer I am using. That is enough. In this 256 bytes I get about as much information as I used to get in 1024 bytes because I'm not doing any formatting. I not even doing any word wrap. I'm just packing the screen densely with characters.

One reason is that I want to explore the value of the color words. If I have some colorful words here and some different color how will that work? I find that it works very well. I don't need to format this with all defined words on the left. In fact that might be ugly because I have a wall of red on the left and if it is organized that way you don't need to make it red.

I do feel that when you are putting up a web page for instance that you should take this philosophy. Put up as little information on a web page as you have to to make it clearer what you are trying to say to people. Don't get wordy about it. On the other hand don't put a web page with an index, that is a nuisance. You want to put up real information. You want to highlight the information that's important. But you want to make it clear and readable.

It isn't easy for me to change my character size, It is a 32x32 pixel, I will probably go to 24x24 pixels next time I try it.

In this form an application is rarely one screen. An application is probably about two or three screens and that is about as much code as I write for an application within a context.

For instance I have an application that puts up spectra on the screen relating to the performance of a particular chip and that application takes four or five of these screens. It is nice spectra a nice exercise in presenting information in an understandable and important way. If you get a chance to see a demo of that sometime it would be fun.

Small applications, application isn't the right word. Small bits of code to do a particular thing and are not generalized to do anything else.

Jeff has reminded me of a couple of other concepts in Machine Forth. Machine Forth is what I like to call using the Forth primitives built into the computer instead of interpreted versions of those or defining macros that do those. One of those is IF.

Classically IF drops the thing that's on the stack and this was inconvenient to do on i21 so IF leaves its argument on the stack and very often you are obliged to write constructs like IF DROP. But not always. It seems that about as often as it is inconvenient to have IF to leave an argument on the stack it is convenient to have that to happen. It avoids using a DUP IF or a ?DUP. So with this convention the need for ?DUP has gone away. And ?DUP is a nasty word because it leaves a variable number of things on the stack and that is not a wise thing to do.

IF
-IF

In addition to IF, Machine Forth has a minus if -IF. This one is testing zero and this one is for the sign bit only. It was my thinking that very often you are going to make a decision on whether a number is positive or negative. It hasn't work out that way. In Color Forth I don't even bother to use this instruction.

The world has changed in the last twenty years in ways that maybe are obvious but I think no one would have anticipated. When I first started in this business in fifty seven computers were used for calculating, computing. And the applications in those days usually involved large long complex algebraic expressions. The factorization that we did was to factor things that would not have to be recomputed so that the whole thing would go faster, and that was the whole the point of FORTRAN. That tradition has stuck with us even today.

I don't know the statistics but I would guess that most computers don't compute, they move bytes around. If you have a browser your browser is not calculating anything except maybe the limit of what will fit on the screen at one time. This concept of looking at the sign of a number was not as useful as I suspected. Almost all numbers are positive so there is no sign to look at. For the same reason Machine Forth does not have a subtract operator. I use this symbol to be a ones complement.

- ones comp.

It isn't even very easy to do a subtract. But for today's applications, implementing protocols or displaying text, arithmetic is not necessary. A computer should not be optimized for arithmetic and mine are not.

On the other hand in order to optimize data transfer it is very useful to have an incremented fetch operator. One of the problems I suppose on any computer, but it is particularly apparent on i21 is addresses. On i21 an address is twenty bit number. To load an address you pretty much have to do a literal fetch which takes an extra twenty bit word. It takes an extra memory cycle to load the address and then you can do the fetch against the address which takes another memory cycle. So the manipulation of address is expensive and you want to do it as little as possible. Fetch-plus (@+) helps with that. You put an address in the address register, which is called A, and it stays there a while. And if you execute this operator (@+) A gets incremented and you can fetch a sequence of things in order. Likewise you can store a sequence of things. And of course you have fetch (@) and store (!) without increment for when you need them.

These operators are not in classic Forth. I don't think they are even mentioned in the standard. They lead to a completely different style of programming. In the case of the DO LOOP the efficient thing to do was actually put the address as your loop control parameter and then refer to it as I and do an I fetch (I @) inside the loop. And the DO LOOP is working with the addresses. If you don't do that, if you have the fetch-plus (@+) operator you don't need the DO LOOP, you don't need the I, you use the fetch-plus (@+) inside of the loop to fetch the thing that's different each time. It is different but equivalent. You can map one to the other. In addition to the notion of having to fetch something who's address is conveniently stored in the A register.

In the case of MOVE, where you want to move something from one region of memory to another you have to have two addresses. Hence the value of being able to address a value stored in the R register. Since that is probably the only context that you will be using an address in the R register it has an automatic increment associated with it. So I've got basically a fetch-plus (@+) against A and a fetch, and a fetch-R against R or a store-R so you can do a MOVE operator efficiently.

A brings up another issue. A acts very much like a local variable, a place where you can store something for a while and then retrieve it later in addition to acting as an address register. The reason that it acts as an address register, the reason I have it as an address is literally to provide a mechanism for this (@+). It is more convenient for the address to be on the stack from the programmer's standpoint, but if you are going to access repetitively you have to put it in a place where you can increment it. To put it there you have to be able to access that register, if you are going to do that you can use that register for other things. Just like you can use the return stack for other things.

The difference is that when you put something on the return stack you have to take it off again and with A you don't. There has been a temptation to make A into a stack you could push several things onto A and pop them back again. It has never been clear that that was worth the cost of doing. It would require more instructions to access A. Do you want to DUP A? Do you want to DROP A? So far it has just been easier to leave A as it is.

I probably want another register which I would call M to hold a multiplier. in doing a forty bit multiplication. But I have never had enough multiplies in the system to be worth doing that so it hasn't happened.

But such registers raises the question of local variables. There is a lot of discussion about local variables. That is another aspect of your application where you can save 100% of the code. I remain adamant that local variables are not only useless, they are harmful.

If you are writing code that needs them you are writing, non-optimal code? Don't use local variables. Don't come up with new syntaxs for describing them and new schemes for implementing them. You can make local variables very efficient especially if you have local registers to store them in, but don't. It's bad. It's wrong.

It is necessary to have variables. Color Forth has got a whole slew of system variables for something that are necessary and it is very useful when you are editing something to have cursor position variable so when you come back to the cursor is still there and you can pick up where you left off. Variables are essential. I don't see any use for a small number of variables and I don't see any use for variables which are accessed instantaneously.

It is an exercise in cleverness in interpreting the stack diagrams and in assigning names to words. You can play all kinds of games here. I think perhaps Forth programmers play too many games with the tool they have because there's no applications. Much better if Forth programmers would concentrate on writing the applications rather than refining the tool.

To me the application du jour is a web browser. If you have run out of things to do write a web browser. Netscape did not have the last word on what it should be or how it should look or how what it should do. In fact Netscape and Microsoft both borrowed heavily from the Mosaic browser. It is as if there was only one browser written. It is if there was only one language written, and it's FORTRAN.

Write a new browser. It's a good application. It gives you access to a world of information. It's a good application and it one that I am trying to focus on in my spare time.

So for those of you who are listening, and have listened all the way through this tape thank you.



UltraTechnology
2512 10th St.
Berkeley, CA 94710-2520

No comments: