Thursday, July 23, 2009

forth 1% the code of c for given app

1% the code
This is a provocative statement. It warrants some discussion.
C programs
I've studied many C programs in the course of writing device drivers for colorForth. Some manufacturers won't make documentation available, instead referring to Linux open source.

I must say that I'm appalled at the code I see. Because all this code suffers the same failings, I conclude it's not a sporadic problem. Apparently all these programmers have copied each others style and are content with the result: that complex applications require millions of lines of code. And that's not even counting the operating system required.

Sadly, that is not an undesirable result. Bloated code does not just keep programmers employed, but managers and whole companies, internationally. Compact code would be an economic disaster. Because of its savings in team size, development time, storage requirements and maintainance cost.

What's wrong with C programs?
Some problems are intrinsic to the C language:
It has elaborate sytnax. Rules that are supposed to promote correctness, but merely create opportunity for error.
It has considerable redundancy. This increases trivial errors that can be detected. And program size.
It's strongly typed, with a bewildering variety of types to keep straight. More errors.
As an infix language, it encourages nested parentheses. Sometimes to a ludicrous extent. They must be counted and balanced.
It's never clear how efficiently source will be translated into machine language. Constructs are often chosen because the programmer knows they're efficient. Subroutine calls are expensive.
Because of the elaborate compiler, object libraries must be maintained, distributed and linked. The only documentation usually addresses this (apparantly difficult) procedure.
Others are a matter of style:
Code is scattered in a vast heirarchy of files. You can't find a definition unless you already know where it is.
Code is indented to indicate nesting. As code is edited and processed, this cue is often lost or incorrect.
Sometimes a line of code contains only a parenthesis, or semicolon. This reduces the density of the code, and the difficulty of reading it.
There's no documentation. Except for the ubiquitous comments. These interrupt the code, further reducing density, but rarely conveying useful insight.
Names tend to be hyphenated. This makes them unique and displays their position in the heirarchy. The significant portion of a name is hard to detect, slow to read.
Constants, particularly fields within a word, are named. Even if used, the name rarely provides enough information about the function. And requires continual cross-reference to the definition.
Preoccupation with contingencies. In a sense it's admirable to consider all possibilities. But the ones that never occur are never even tested. For example, the only need for software reset is to recover from software problems.
Conditional compilation. More constants include or exclude code for particular platforms. More indentation. More difficulty fathoming which code is relevant.
Hooks for future enhancements, or abandoned features, are abundant. This is useful only in understanding the programmer's ambitions.
It is in a programmer's best interest to exaggerate the complexity of his program.
Another difficulty is the mindset that code must be portable across platforms and compatible with earlier versions of hardware/software. This is nice, but the cost is incredible. Microsoft has based a whole industry on such compatibility.
Forth
colorForth does it differently. There is no syntax, no redundancy, no typing. There are no errors that can be detected. Forth uses postfix, there are no parentheses. No indentation. Comments are deferred to the documentation. No hooks, no compatibility. Words are never hyphenated. There's no heirarchy. No files. No operating system.

Code is organized so that a block of related words fit on the screen. Names are short with a full semantic load. The definition of a word is typically 1 line. Machine code has a one-to-one correspondance with source.

An application is organized into multiple user interactions, with unique display and keypad. Each is compiled when accessed. Its code is independent, names need not be unique. A background task is always running.
Comparison
Yes, I could write a better C program that those I've seen. It wouldn't be nearly as good as Forth. I can't write an assembler program as good as Forth. No, I don't think Forth is the best possible language. Yet.

But does this add up to 1% the code? Where is the C program I've recoded? No one has paid me to do that. One difficulty is comparing my Forth with the original C. I cheat. The 1% code merely starts an argument that they're not the same.

For example, my VLSI tools take a chip from conception through testing. Perhaps 500 lines of source code. Cadence, Mentor Graphics do the same, more or less. With how much source/object code? They use schematic capture, I don't. I compute transistor temperature, they don't.

But I'm game. Give me a problem with 1,000,000 lines of C. But don't expect me to read the C, I couldn't. And don't think I'll have to write 10,000 lines of Forth. Just give me the specs of the problem, and documentation of the interface.
My Conclusion

colorForth's incredibly small applications provide new estimates of their overstated complexity.

No comments: