http://www.advogato.org/article/398.html
OBJECT PREVALENCE
Posted 23 Dec 2001 at 03:46 UTC by KlausWuestefeld
Transparent Persistence, Fault-Tolerance and Load-Balancing for Java Systems.
Orders of magnitude FASTER and SIMPLER than a traditional DBMS. No pre or post-processing required, no weird proprietary VM required, no base-class inheritance or clumsy interface definition required: just PLAIN JAVA CODE.
How is this possible?
Question: RAM is getting cheaper every day. Researchers are announcing major breakthroughs in memory technology. Even today, servers with multi-gigabyte RAM are commonplace. For many systems, it's already feasible to keep all business objects in RAM. Why can't I simply do that and forget all the database hassle?
Answer: You can, actually.
Are you crazy? What if there's a system crash?
To avoid losing data, every night your system server saves a snapshot of all business objects to a file using plain object serialization.
What about the changes occurred since the last snapshot was taken? Won't the system lose those in a crash?
No.
How come?
All commands received from the system's clients are converted into serializable objects by the server. Before being applied to the business objects, each command is serialized and written to a log file. During crash recovery, first, the system retrieves its last saved state from the snapshot file. Then, it reads the commands from the log files created since the snapshot was taken. These commands are simply applied to the business objects exactly as if they had just come from the system's clients. The system is then back in the state it was just before the crash and is ready to run.
Does that mean my business objects have to be deterministic?
Yes. They must always produce the same state given the same commands.
Doesn't the system have to stop or enter read-only mode in order to produce a consistent snapshot?
No. That is a fundamental problem with transparent or orthogonal persistence projects like PJama (http://www.dcs.gla.ac.uk/pjava/) but it can be solved simply by having all system commands queued and routed through a single place. This enables the system to have a replica of the business logic on another virtual machine. All commands applied to the "hot" system are also read by the replica and applied in the exact same order. At backup time, the replica stops reading the commands and its snapshot is safely taken. After that, the replica continues reading the command queue and gets back in sync with the "hot" system.
Doesn't that replica give me fault-tolerance as a bonus?
Yes it does. I have mentioned one but you can have several replicas. If the "hot" system crashes, any other replica can be elected and take over. Of course, you must be able to afford a machine for every replica you want.
Does this whole scheme have a name?
Yes. It is called system prevalence. It encompasses transparent persistence, fault-tolerance and load-balancing.
If all my objects stay in RAM, will I be able to use SQL-based tools to query my objects' attributes?
No. You will be able to use object-based tools. The good news is you will no longer be breaking your objects' encapsulation.
What about transactions? Don't I need transactions?
No. The prevalence design gives you all transactional properties without the need for explicit transaction semantics in your code.
How is that?
DBMSs tend to support only a few basic operations: INSERT, UPDATE and DELETE, for example. Because of this limitation, you must use transaction semantics (begin - commit) to delimit the operations in every business transaction for the benefit of your DBMS. In the prevalent design, every transaction is represented as a serializable object which is atomically written to the queue (a simple log file) and processed by the system. An object, or object graph, is enough to encapsulate the complexity of any business transaction.
What about business rules involving dates and time? Won't all those replicas get out of sync?
No. If you ask the use-case gurus, they will tell you: "The clock is an external actor to the system.". This means that clock ticks are commands to the business objects and are sequentially applied to all replicas, just like all other commands.
Is object prevalence faster than using a database?
The objects are always in RAM, already in their native form. No disk access or data marshalling is required. No persistence hooks placed by preprocessors or postprocessors are required in your code. No "isDirty" flag. No restrictions. You can use whatever algorithms and data-structures your language can support. Things don't get much faster than that.
Besides being deterministic and serializable, what are the coding standards or restrictions my business classes have to obey?
None whatsoever. To issue commands to your business objects, though, each command must be represented as a serializable object. Typically, you will have one class for each use-case in your system.
How scalable is object prevalence?
The persistence processes run completely in parallel with the business logic. While one command is being processed by the system, the next one is already being written to the log. Multiple log files can be used to increase throughput. The periodic writing of the snapshot file by the replica does not disturb the "hot" system in the slightest. Of course, tests must be carried out to determine the actual scalability of any given implementation but, in most cases, overall system scalability is bound by the scalability of the business classes themselves.
Can't I use all those replicas to speed things up?
All replicas have to process all commands issued to the system. There is no great performance gain, therefore, in adding replicas to command-intensive systems. In query-intensive systems such as most Web applications, on the other hand, every new replica will boost the system because queries are transparently balanced between all available replicas. To enable that, though, just like your commands, each query to your business logic must also be represented as a serializable object.
Isn't representing every system query as a serializable object a real pain?
That's only necessary if you want transparent load-balancing, mind you. Besides, the queries for most distributed applications arrive in a serializable form anyway. Take Web applications for example: aren't HTTP request strings serializable already?
Does prevalence only work in Java?
No. You can use any language for which you are able to find or build a serialization mechanism. In languages where you can directly access the system's memory and if the business objects are held in a specific memory segment, you can also write that segment out to the snapshot file instead of using serialization.
Is there a Java implementation I can use?
Yes. You will find Prevayler - The Open-Source Prevalence Layer, an example application and more information at http://www.prevayler.org. It does not yet implement automatic load-balancing but it does implement transparent business object persistence and replication is in the oven.
Is Prevayler reliable?
Prevayler's robustness comes from its simplicity. It is orders of magnitude simpler than the simplest RDBMS. Although I wouldn't use Prevayler to control a nuclear plant just yet, its open-source license ensures the whole of the software developing community the ability to scrutinize, optimize and extend Prevayler. The real questions you should bear in mind are: "How robust is my Java Virtual Machine?" and "How robust is my own code?". Remember: you will no longer be writing feeble client code. You will now have the means to actually write server code. It's the way object orientation was intended all along; but it's certainly not for wimps.
You said Prevayler is open-source software. Do you mean it's free?
That's right. It's licensed under the Lesser General Public License.
But what if I'm emotionally attached to my database?
For many applications, prevalence is a much faster, much cheaper and much simpler way of preserving your objects for future generations. Of course, there will be all sorts of excuses to hang on to "ye olde database", but at least now there is an option.
---------------------------------------------------------------------
ABOUT THE AUTHOR
KlausWuestefeld enjoys writing good software and helping other people do the same. He has been doing so for 17 years now. He can be contacted at klaus@objective.com.br.
---------------------------------------------------------------------
"PREVAYLER" and "OPEN-SOURCE PREVALENCE LAYER" are trademarks of Klaus Wuestefeld.
Copyright (C) 2001 Klaus Wuestefeld.
Unmodified, verbatim copies of this text including this copyright notice can be freely made.
Interesting but..., posted 23 Dec 2001 at 07:08 UTC by ncm »
There are still quite a few things we really need transactions for:
When you make the first of a series of changes to objects in the database, you typically break one or more database invariants until you get the last change entered. Other processes looking at the database had better either wait, or had better see the state it had before you started. To get much concurrency, you need to snapshot the state before the first change.
If you get halfway through a series of changes and crash, the system had better come back up without the changes you made, because you're not going to be equipped to continue where you left off.
If you get halfway through a series of changes and discover some condition that keeps you from finishing, you had better be able to just drop the changes and pick up with the original snapshot.
If N processes make a series of conflicting changes concurrently, (N-1) of them had better be told that their changes have failed, and that they must try again.
There's a reason that databases are written by career professionals. A simple object database can be really useful, but that doesn't make it a substitute for the real thing. That's part of why so many "object database" companies failed some ten years back.
Transactions, posted 23 Dec 2001 at 09:03 UTC by Pseudonym »
Actually, transactions are not so important in the external interface of an OODBMS. In an RDBMS, a manipulation typically involves several SQL statements (e.g. insert, update, remove) each of which can act on only one table at a time. So if a transaction needs to manipulate more than one table, you need to ensure that the set of statements is atomic by issuing a transaction.
In an OODBMS, where manipulation methods can operate on more than one class, the need is reduced somewhat. Internally, you can just queue up the command logs until the method is complete, then write them out together. Then the problem becomes entirely one of synchronisation. It's not quite ACID, but it'll do for most business applications.
You (ncm) are right, however, in that this solution, while no doubt excellent for many purposes (e.g. if you're happy with the robustness and performance of MySQL, you'll probably be happy with this, too), won't scale to many critical applications. For example, it would be quite hard to handle replication in any sane manner.
Klaus, as a matter of interest, how did you manage to get Java to force flushing to disk?
Scalability ?, posted 23 Dec 2001 at 12:03 UTC by jneves »
Is it just me, or prevayler isn't useful in anything else but a uniprocessor machine as is? You process requests one at a time, which means that there can't be 2 different requests being processed at the same time in different processors. And when you have several replicas you have to have some coordination between all replicas to insure the order of the requests. Or am I missing something here ?
Interesting but problematic, posted 23 Dec 2001 at 12:59 UTC by dannu »
Thank you (Klaus Wuestefeld) for your nice write-up. I mostly agree to the points made by the other repliers.
Let me discuss/ask some further points:
distributed systems? if a systen-prevalence-deployed application contacts other services resp. other servers you have a synchronization problem. How do you handle that? I guess, that you end up doing a 2PC-like synchronization between your prevalence servers.
fine grained tx-model versus all or nothing? with big BOs -systems there are actually lots of small transactions. the system prevalence paradigma doesn't give you a fine grained application side control, or does it? Note though that you can adapt (extended) 2PC-transactions to efficiently work RAM-based while retaining persistent storage properties (by using RAM-based subtransactions and files or RDBMS in the root transaction).
scalabity? having all "commands queued and routed through a single place" doesn't scale very well. consider one of these big 64 processor multigigabyte machines using a gigabit card: you wouldn't want all requests to be serialized through a single bottleneck which involves IO. With fine grained distributed transactions you don't need this "single place" or even a single server. I appreciate the "do it in background" approach, though, as an advance to requiring requests to be queued while saving the state. It's quite neccesary for 24/7 systems.
In my oppinion the complexity of 2PC-systems comes from shortcomings of the commercial products (BEA WLE, Websphere, Oracle etc.). They impose big clumsy quite old fashioned development schemes where the developer is restricted and has to keep track of many conditions. This partly stems from the pain with underspecified and often incorrectly implemented XA-interfaces. (e.g. writing multithreaded programs with XA-adapters from the main RDBMs is a desaster).
I think that system prevalence would help implementing web applications which are located on single systems. It is a simple enough paradigm to be used and understood by companies which often fail or are very slow with 2pc-transaction systems. Handling of error conditions (pointed out by ncm) might still be a big problem.
just my 2 (soon to be) eurocent and best wishes!
holger
I'll be back..., posted 23 Dec 2001 at 14:08 UTC by KlausWuestefeld »
THANKS A LOT for the FEEDBACK!
This is the first forum outside of my working group to actually get the idea and give me some positive feedback.
I am just leaving on a trip right now (my wife is calling me ;) for Christmas and will be back on wednesday. Then, I will address all concerns: ACID properties, error-condition recovery, scalability, the works...
Just a note on scalability and concurrency to think about over Christmas: Suppose we have a subscriber management system that receives a file from a bank with 100000 (one-hundred-thousand) payment records. A prevalent server running on a regular desktop machine can handle a command/transaction for this in less than a millisecond and be ready for the next command.
Merry Christmas! See you soon.
testing, debugging, integration, and data migration, posted 23 Dec 2001 at 19:33 UTC by jrobbins »
I used to be a professional SmallTalk programmer, I also was a professional Lisp programmer. Both of those languages use the concept of a saved memory image as part of their normal development environment.
The simplicy of "just saving the system state" is a double-edged sword. The downside is that it is often hard to specify a particular system state that you might want to use for testing or debugging. If you ever get an object into a "bad state", it can be very hard to find out how it got into that state. In contrast, the impedence mismatch between OO systems and RDBMSs provides a natural boundary and conceptual bottleneck for testing and debugging. It is realtively easy to compare 2 database dumps to see what is different, or to populate the database with test data, or to see which INSERT statement introduced a particular row into the database. You could have test data consisting of a long set of commands, but that "algebraic" approach to testing does not scale well, and allows defects in mutators to mask defects in accessors.
One thing that I learned while trying to actually sell ST-80 systems to other divisions in a large company is that IS organizations see a standard RDBMS as an integration point. If your system uses an RDMBS, they can plan capacity on a shared database machine: they can generate ad-hoc reports, they can use standard tools for disk backups and such on the database machine only. Also, in the event that your system eventaully dies (is no longer maintained, or the license is not extended or whatever) they will at least have the data in a format that they can get out of your system's tables and into some other system.
Lastly, upgrades were always a pain in image-based tools. Very incremental changes (like adding an instance variable to a class) can be handled by the serialization system. Any reoganization beyond that would require custom coding. In contrast, you can do small and mid-sized reorganizations a lot easier in SQL.
Why bother with disk at all?, posted 23 Dec 2001 at 20:23 UTC by egnor »
I'll take the opposite tack for variety:
If you're going this far, why bother with a disk at all? Just attach a battery to your RAM. If you want reliability, keep replicas. If a replica is lost, "clone" another one by freezing its message queue and copying the frozen image; the two clones can then "catch up" with the queued messages in parallel.
Copy-on-write VM tricks may soften the need to entirely freeze a replica during checkpointing.
I suspect the points raised in most of the comments can be fixed. (After all, suppose we were looking the other way. Compared to modern programming languages, databases and middleware systems have lots of horrible misfeatures, starting with bad syntax and ending with fundamentally broken models of (non-)encapsulation and (non-)reuse and (non-)genericity; the complaints in the other direction seem relatively trivial by comparison. How can any self-respecting software engineer stand to use today's RDBMS systems without feeling dirty all over?)
jrobbins's notes are the most interesting. It's worth noting that these are basically software engineering problems having to do with how to maintain long-running systems, not issues with the physical architecture proposed here. Is an RDBMS the best way to solve those software engineering problems? It's hard to believe. Are these problems worth solving for other domains? You betcha. I'd love to be able to upgrade my applications without restarting them. (Thanks to Debian, I can mostly upgrade my operating system without restarting it -- something users of e.g. Windows may have difficulty imagining.)
Relational algebra and persistence are both supposed to be simple, posted 24 Dec 2001 at 03:36 UTC by tk »
Data persistence is definitely not a new idea. In fact, if I remember correctly, persistent storage (ferrite cores) actually predate volatile storage. I guess it somehow faded away, only to emerge recently under the guise of persistent OSs such as EROS, persistent architectures such as Prevayler which we now discuss, and so on.
It's hard to see how relational algebra or persistence compare with each other. After all, relational algebra was supposed to be simple anyway -- data are nothing more than just lots of mathematical relations, right? We now know however that this `simple' idea is fraught with practical problems.
Will the same happen for persistence? Maybe, or maybe not. As jrobbins mentioned, changing the `shape' of objects is a problem, and there are probably many other problems.
XML Serialization, posted 24 Dec 2001 at 11:54 UTC by CryoBob »
I might be taking a bit of a simplistic view on the subject but couldn't alot of the issues raised by jrobbins relating to testing and having data in a useful format if a system is retired; be addressed by XML serialization. If we are going to be able to serialize all the commands and business objects why not have an option or feature to dump this information to a XML file. Then when tracking states you could do a dump at each command and compare the XML output to see where things are going wrong.
XML serialization also has the advantage of being self describing rather than in a group of tables in binary format on a database server. I mean what happens if your RDBMS company goes bust and you can't get at the data because of a licence timeout for example...
Obviously XML serialization will implement another overhead to the system, but if implemented correctly you could serialize in binary format to boost performance, and then you should you need to restore the state for investigative/testing/export purposes load the objects through an Object to XML parsing engine and look at the output.
Processing speed doesn't increase consistency, posted 25 Dec 2001 at 15:01 UTC by baueran »
Yes, you are right: in RAM a desktop machine may be able to process your 100.000 records in less than a second or something (I don't think that's representative for anything though), but I do not think that makes the system necessarily more consistent or bullet-proof. What happens if you (or any of your client applications) run into a deadlock within a millisecond? How consistent will the rest of the system and data be without an ACID paradigm to rely on? Correct me, if I'm just not getting the point, but I believe such an issue is not addressed in this approach.
trademarks?, posted 25 Dec 2001 at 18:45 UTC by dalke »
Minor point, but '"PREVAYLER" and "OPEN-SOURCE PREVALENCE LAYER" are trademarks of Klaus Wuestefeld'? I'm curious on trademarking a few things of my own, so I checked the USPTO. Neither marks are listed. Given the email address of ".br", are they only trademarked in Brazil?
I'm Back, posted 26 Dec 2001 at 18:59 UTC by KlausWuestefeld »
I agree that the Prevayler implementation, as it is today, is robust, fast and scalable enough for most applications.
In the company where I work there are 7 people working on two projects using Prevayler to be released in January. I am also glad to help any other early Prevayler adopters.
I would like to share some thoughts, though, on the use of prevalence "in the large" to make sure that we are not missing out on some very interesting possibilities.
First, I will give a few very quick, specific and UNJUSTIFIED answers, and then, in a separate comment, I will give a more complete explanation in an attempt to clarify all concerns so far...
Re: Interesting but..., posted 26 Dec 2001 at 19:10 UTC by KlausWuestefeld »
There are still quite a few things we really need transactions for: -- ncm
I apologize. Prevayler does have transactions.
Although a prevalent system can define transactions (commands) and provide them for a client to use, there is NO TRANSACTION SCHEME the client can use to arbitrarily define new TYPES of transactions (new atomic sets of business operations) whenever it fancies. The last thing we need is another transaction scheme allowing clients to bring business logic into their own hands.
I realize the article is confusing in this respect. I have corrected the "oficial" version of the article to make this clear.
When you make the first of a series of changes to objects in the database, you typically break one or more database invariants until you get the last change entered. Other processes looking at the database had better either wait, or had better see the state it had before you started.
Yes. In the prevalence scheme, the other processes shall wait.
To get much concurrency, you need to snapshot the state before the first change.
Hmmm. What if the waiting time for each transaction is only a few microseconds? (I shall explain...)
If you get halfway through a series of changes and crash, the system had better come back up without the changes you made, because you're not going to be equipped to continue where you left off.
Yes. The article already covers this well, though. Are there any doubts?
If you get halfway through a series of changes and discover some condition that keeps you from finishing, you had better be able to just drop the changes and pick up with the original snapshot.
"You" (the system server, I presume) will never be halfway through a series of changes and discover some condition that keeps "you" from finishing. (I shall explain...)
If N processes make a series of conflicting changes concurrently, (N-1) of them had better be told that their changes have failed, and that they must try again.
There are no concurrent changes in a prevalent scheme. All changes are sequenced.
There's a reason that databases are written by career professionals.
Yes. Databases are way too complex. ;)
A simple object database can be really useful, but that doesn't make it a substitute for the real thing. That's part of why so many "object database" companies failed some ten years back.
Prevalence is a persistence scheme, and, like OODBMSs, Prevayler will guarantee a logically crash-free object space for your business objects. Prevayler is not an object database manager, as I see it, though. It does not provide any sort of language for data storage or retrieval (ODBMSs normally provide some OQLish thing). Database managers are also worried, among other things, about how they will store chunks of data from RAM to disk and how they will retrieve those chunks later. When you have enough RAM for all your system data, you need no longer worry about that.
When you have enough RAM (the prevalence hypothesis) and a crash-free object space, many database career professionals' assumptions no longer hold.
Interesting but ... one has to free one's mind. New possibilities are waiting.
Re: MySQL Comparison, posted 26 Dec 2001 at 19:14 UTC by KlausWuestefeld »
(e.g. if you're happy with the robustness and performance of MySQL, you'll probably be happy with this, too)
Of course you will be happy! Prevayler is much more robust* and much faster** than MySQL. ;)
* Robustness, as I understand it, is related to failure. The less failures something presents, the more robust it is - as simple as that. Prevayler's robustness is bounded by the robustness of the VM and its serialization algorithm. Prevayler is so simple (564 lines including comments, javadoc and blank lines) you could probably write a formal proof for it. ** I have tried both but please don't take my word. Try them out too.
"Since Prevayler is also simpler to use, what is the advantage of MySQL?" Some people like SQL and the relational model. MySQL is a relational database manager with an SQL interface. Prevayler is not.
Re: Java Flushing to Disk, posted 26 Dec 2001 at 19:17 UTC by KlausWuestefeld »
Klaus, as a matter of interest, how did you manage to get Java to force flushing to disk?
FileOutputStream.getFD().sync()
Re: Interesting but problematic, posted 26 Dec 2001 at 19:24 UTC by KlausWuestefeld »
Thank you (Klaus Wuestefeld) for your nice write-up.
You are welcome.
Let me discuss/ask some further points: distributed systems? if a systen-prevalence-deployed application contacts other services resp. other servers you have a synchronization problem. How do you handle that? I guess, that you end up doing a 2PC- like synchronization between your prevalence servers.
I didn't understand the question very well.
Fine grained tx-model versus all or nothing? with big BOs - systems there are actually lots of small transactions. the system prevalence paradigma doesn't give you a fine grained application side control, or does it?
No it doesn't. I believe that to be inefficient and unnecessary. Maybe we could discuss an example where you think it might be necessary.
Note though that you can adapt (extended) 2PC-transactions to efficiently work RAM-based while retaining persistent storage properties (by using RAM-based subtransactions and files or RDBMS in the root transaction).
Yes. I know. Three years ago, I wrote an object-relational persistence layer for Java that had nested transactions in RAM and an optional* RDBMS in the root transaction.
* You could run everything in RAM if you wanted. That was good for presentations, developing without database configuration hassle and running test scripts very fast.
Scalabity? having all "commands queued and routed through a single place" doesn't scale very well. We should better consider one of these big 64 processor multigigabyte machines using a gigabit card: you wouldn't want all requests to be serialized through a single bottleneck which involves IO.
Make sure you let the people using ORACLE (and its redo log files) know about that. ;)
With fine grained distributed transactions you don't need this "single place" or even a single server.
Sounds interesting. Could you elaborate and give an example?
I appreciate the "do it in background" approach, though, as an advance to requiring requests to be queued while saving the state. It's quite neccesary for 24/7 systems.
Was it clear to you, from the article, that your prevalent system DOES NOT have to stop in order to save its state?
Re: testing, debugging, integration, and data migration, posted 26 Dec 2001 at 19:31 UTC by KlausWuestefeld »
I used to be a professional SmallTalk programmer, ...
Me too, for 5 years. :)
The simplicy of "just saving the system state" is a double-edged sword. The downside is that it is often hard to specify a particular system state that you might want to use for testing or debugging. If you ever get an object into a "bad state", it can be very hard to find out how it got into that state.
In the prevalent scheme, with some daily system snapshots, you can retrieve the system's state before it "got bad"; and with the command logs you can actually replay your commands one-by-one until you get to the rotten one. Of course, I am supposing you have a decent "object encapsulation breaker" FOR DEBUGGING PURPOSES ONLY.
I know there aren't many of those around (compared to SQL-based tools) but that is more of a cultural problem, I believe. As you say, people are used to rows and columns. They like to break their systems' encapsulation with SQL tools and, at the same time, they like to complain: "Where are all the benefits object orientation has promised us?". ;)
What can you do? I expect things like Prevayler to gradually break this vicious circle.
Lastly, upgrades were always a pain in image-based tools. Very incremental changes (like adding an instance variable to a class) can be handled by the serialization system. Any reoganization beyond that would require custom coding. In contrast, you can do small and mid- sized reorganizations a lot easier in SQL.
Me and my team would always do our migrations in Smalltalk (I wrote an object-relational persistence layer for Smalltalk 6 years ago). We would only use SQL or PL as a last resort and for performance reasons. With all your objects in RAM, that is a different story... ;)
Re: Why bother with disk at all?, posted 26 Dec 2001 at 19:33 UTC by KlausWuestefeld »
You can certainly go for RAM all the way and have several replicas, if you can afford it.
I could not agree more with egnor.
Just a comment on the "Copy-on-write VM tricks" to "soften the need to entirely freeze a replica during checkpointing.": It is a bit complicated dealing with executing threads because your memory might never be in a consistent state at any given moment in time. The orthogonal persistence guys (like the guys mentioned in the article) have not figured how to solve this problem.
With prevalence, the problem simply doesn't exist.
Re: XML Serialization, posted 26 Dec 2001 at 19:34 UTC by KlausWuestefeld »
There is a colleague of mine fiddling with several XML-serialization libraries because he wants to include that in Prevayler.
Re: Processing speed doesn't increase consistency, posted 26 Dec 2001 at 19:37 UTC by KlausWuestefeld »
The point about speed is that, if every transaction is extremely fast, you do not have to handle concurrent transactions. That makes life MUCH easier. I am not only talking about sheer RAM processing speed increase, mind you. I am talking about a design change. I shall explain it in one of the following comments.
The ACID properties do remain.
Re: Trademarks, posted 26 Dec 2001 at 19:38 UTC by KlausWuestefeld »
"PREVAYLER" and "OPEN-SOURCE PREVALENCE LAYER" are trademarks of Klaus Wuestefeld in the same way that "Linux" is a trademark of Linus Torvalds.
They are not REGISTERED trademarks though. Much like a copyright, you do not have to register it to be entitled to a trademark.
Of course, the suits will always tell you that it is better to register.
Serialization Throughput Test, posted 26 Dec 2001 at 19:47 UTC by KlausWuestefeld »
How fast does serialization run on your machine?
import java.io.*;
public class SerializationThroughput {
static public void main(String[] args) {
try {
FileOutputStream fos = new FileOutputStream(new File ("tmp.tmp"));
ObjectOutputStream oos = new ObjectOutputStream(fos);
Thread.sleep(5000); //Wait for any disk activity to stop.
long t0 = System.currentTimeMillis();
int max = 10000;
int i = 0;
while (i++ < max) {
oos.writeObject(new Integer(i));
oos.reset();
oos.flush();
fos.getFD().sync(); //Forces flushing to disk. :)
}
System.out.println("This machine can serialize " + max * 1000 / (System.currentTimeMillis() - t0) + " Integers per second.");
} catch (Exception e) {
e.printStackTrace();
}
}
}
My 450MHz K6II running windows98 with a 3 year old IDE hard drive gives me the following result: "This machine can serialize 576 Integers per second."
Does anyone give me more? :)
PREVALENCE IN THE LARGE, posted 26 Dec 2001 at 20:11 UTC by KlausWuestefeld »
OK, here we go:
I shall leave automatic load-balancing aside for now and concentrate on the concerns we already have.
Atomicity and Crash-Recovery
This is already covered in the article.
Consistency and Error-Conditions
Every command is executed on its own. The business system must either check for inconsistencies before it starts executing any command or be able to undo whatever changes were done if it runs into an inconsistency. In my designs I prefer the first approach. The demo application included with Prevayler has good examples.
Isolation
While a client is preparing a command to be executed, no other client can see what that command is all about.
Durability
The snapshots and command logs guarantee your persistence. If you use replicas, as described in the article, your system shall not only persist, it shall prevail.
Scalability and Performance
Suppose we have a multi-threaded system in which all threads do all of the three following things:
1) Client stuff - Waiting for an HTTP request; Waiting for an RMI request; Reading a file; Preparing a command to be executed; Writing a file; Generating HTML; Painting a GUI screen; etc...
2) Prevayler stuff - Logging a command to a file. (This is the only thing Prevayler does on the hot system during execution. The snapshot is taken by the replica and has no impact here.)
3) Business stuff - Processing a command; Evaluating a query.
For simplicity, Prevayler's implementation, today, will synchronize "Logging a command" and "Processing a command" in a single go. That is not necessary though. The only conditions we have to meet are:
- All commands are logged.
- All commands are executed after they are logged.
- All commands are executed in the same order as they are logged.
Using two producer-consumer queues would already alleviate that a little. The main problems, though, are still:
- It might take a long time to serialize certain large commands and Prevayler doesn't serialize and log more than one command at a time.
- The business system cannot process more than one command at a time.
The first problem is easy to solve. 4096 (or more) "slave" log files could be used to serialize and log up to 4096 (or more) SIMULTANEOUS COMMANDS. There must only be a "master" log file indicating in which "slave" log file the last command was serialized (it is not even necessary that the first command that started being logged be the first one to finish). In terms of scalability and throughput, this is as much as you can get even in an RDBMS like ORACLE because of its redo log files.
Take a look at the "Serialization Throughput Test" above, to see how well your machine would do as a "master logger". :)
All these performance enhancements are already scheduled for future Prevayler releases. If anyone is considering using Prevayler on a project for a system that actually needs them already, I will be glad to implement them sooner (or integrate someone else's implementation) and help out on the project design.
All other thread activities, including query evaluation, mind you, can already be processed in parallel. So, you can have as many processors as your VM, OS and hardware will support.
On to the second problem: "The business system cannot process more than one command at a time.".
To overcome that, then, we will establish a simple rule: "The business system cannot take more than a few MICROSECONDS to run any single command."
"Oh no! I knew it! This guy is crazy!", some might think, "How can I possibly process 100000 payment records in only a few microseconds?".
For 99% of your commands, like changing a person's name, you check for inconsistencies (invalid name, duplicate name, etc), and then you just execute it normally. With your objects in RAM, that will only take a few microseconds anyway.
For 1% of your commands (the hairy ones), like processing a batch payment with 100000 payments, lazy evaluation is the key: your system simply doesn't process the command. Instead, it just keeps the command in the "batch payments" list for future evaluation.
The command will be processed bit-by-bit whenever a query is evaluated regarding that command. It is important to note that, while the client is building the command, the command is internally preparing its structure to be kept in the system without further processing. Remeber: a prevalent command is much more than an atomic set of operations. It is a full-fledged object and can be responsible for much of the system's business intelligence! The batch payment command, for example, would keep all payment records internally in a HashMap with contract id as the key.
Suppose you then query the payment status of any given contract. The contract will see "When was the last time I updated my payment status?". It will then look at the "batch payments" list (there are two or three batch payments a month): "Were there any batch payments since my last update?". If there were, the contract updates itself accordingly (one HashMap lookup per batch). Then, the contract simply returns its payment status. This all takes only a few microsecond too.
You could have a query, though, that actually depends on the processing of ALL the payments (e.g. "Total Monthly Revenue"). In this case, the query AND ONLY THIS QUERY will take about 2 seconds* to execute. All the rest of the system continues working at full speed and with full availability.
*Today, my company has an ORACLE based billing system running on big solaris boxes that takes 62.5 machine hours to process 100000 payment records. We estimate that doing it all in RAM would take no more than 2 seconds (on my desktop machine, mind you).
Are there any more doubts or are all your systems already prevalent? ;)
Re: Trademarks, posted 26 Dec 2001 at 20:14 UTC by dalke »
They are not REGISTERED trademarks though. Much like a copyright, you do not have to register it to be entitled to a trademark.
Ahh, thank you. The USPTO link for that is: http://www.uspto.gov/web/offices/tac/tmfaq.htm#Basic001.
Do I need to register my trademark? No..
Also, What are the benefits of federal trademark registration?
Constructive notice nationwide of the trademark owner's claim.
Evidence of ownership of the trademark.
Jurisdiction of federal courts may be invoked.
Registration can be used as a basis for obtaining registration in foreign countries.
Registration may be filed with U.S. Customs Service to prevent importation of infringing foreign goods.
"PREVAYLER" and "OPEN-SOURCE PREVALENCE LAYER" are trademarks of Klaus Wuestefeld in the same way that "Linux" is a trademark of Linus Torvalds.
Umm, except that Linus owns the registered trademark on Linux, serial number 74560867 at uspto.gov. There was a big hoorah about this some five years ago when someone other than Linus registered the term for himself. Some of the links about the topic are mentioned at http://www.linux10.org/history/ .
Of course, the suits will always tell you that it is better to register.
Most "suits" would say that if you have the $325/10 years and don't want to go through the hassle of defending your mark if your work becomes popular, then it's worth it.
Thoughts about Prevayler, posted 27 Dec 2001 at 02:39 UTC by Gandhi »
I'm using prevayler at a beta system I'm developing, and I think the main problem when you expose this kind of system is that you don't have studies saying it's right or not.
Of course a lot of people thought about this before Klaus, but anyone really made a serious study about what are the more commom actions (procedures) perfomed for each category of application?.
What is the best application category for prevayler?.
Anybody knows what is the REAL consystency of the systems at the market?.
Don't you think inconsystency at 99% of the cases are just result of bad code at the top layer? Can't we just make a fault-tolerant system and keep the system working, no matter how bad coder is the guy?.
New java implementation (1.3 and 1.4) has news classes that allows high speed messaging pipes between applications. Can you imagine a better use to these pipes?.
I agree that XML serialization is a good thing, mainly for debugging purposes and it's atomicity, but how can you compress it? And if you compress, why keep it as XML?.
I think that just a better serialization scheme should do the trick, with compression, cryptography, and a hierarquical system that could allow easily XML translation. Externalize methods do the job. Any volunteer?.
One easy question. Is it a framework? Is there a planned plugin structure? Everything will be done through interfaces? No register classes or similar approaches?.
[]s, gandhi.
Prevayler Plug-ins, posted 29 Dec 2001 at 04:06 UTC by KlausWuestefeld »
One easy question. Is it a framework?
Not at present.
Is there a planned plugin structure?
No. Can there be a plugin structure in the future? Yes.
There is no design trait in Prevayler based on predictions for the future. Prevayler's design, at any point in time, will be the simplest design that we can achieve and that satisfies all CURRENT requirements. The goal is anticlimactic simplicity.
Don't worry. Thanks to simplicity, the day you write the first plug-in for Prevayler, we will easily find a way to "plug it in". The day you write your third Prevayler plug-in, there will certainly be a "plug-in structure" in place.
That is the beauty of open-source and that is the beauty of simple design.
To Be Continued..., posted 29 Dec 2001 at 04:16 UTC by KlausWuestefeld »
Anyone interested in knowing more about prevalence or in further discussing the subject (but not necessarily having Advogato certification) take a look at the Prevayler Forum.
See you there, Klaus.
orthogonal persistense, posted 29 Dec 2001 at 15:44 UTC by jerry »
Askemos has a simillar take on persistense. Just not "all in memory" but "allways saved to file" - after each transaction in any of your objects.
Serialization Throughput for Larger Objects, posted 30 Dec 2001 at 18:08 UTC by Ward »
I generalized the throughput test to write records of various size. For small records the time is dominated by the flush; for large ones, transfer time. I found the knee of this classic curve to be at about 300 Integers (3k bytes) on a Windows platform and 100 Integers on a Linux. All but one machine I tested showed other behaviour that I cannot explain. I've written a short note with graphs and the revised test source code.
Fine, what about Garbage Collection?, posted 3 Jan 2002 at 00:10 UTC by jonabbey »
I designed and implemented a RAM-based, transactional database in Java years ago for Ganymede, and I can attest that keeping everything in memory works splendidly. Add a transaction log for recovery, and you're cooking with gas.
At least, that is, for reasonably small datasets. The big open question for Ganymede, and for any memory-resident Java database systems, is how big a cost does Garbage Collection become when you scale up? Using the operating system's native VM subsystem to handle disk paging works fine, but when the Garbage Collector has to sweep through everything periodically in order to clean up garbage, that sweep has presumably to do a good bit of paging to take care of things.
Do you have any insight into how serious a problem this is? Ganymede works fantastically well for us at the scale we need it to, but I've always imagined (but not tested) that putting a gigabyte of directory data into it would probably not work so terribly well.
Re: Garbage Collection (Raising the Bar), posted 3 Jan 2002 at 01:34 UTC by KlausWuestefeld »
I ran a few tests creating huge arrays of Integers and serializing them to stress the limits of some VMs. Everytime we increased the size of the array to a point where the system started paging, we simply had to abort the test after a few hours because we couldn't stand waiting any longer. 55 million was the max we reached without paging, running on an HP-UX machine (Thanks to the guys at HP/PortoAlegre/Brazil).
The prevalence hypothesis, though, is that you have enough RAM for all your data so, even when the garbage collector kicks in, your system shouldn't have to page to disk.
Even if you have enough RAM, the garbage collector can be a nuisance in many large systems and a real show-stopper for time-sensitive critical systems. I am not an expert but it seems that most VMs use a mix of generational garbage collection and traditional mark-and-sweep. I really would like to see some three-colouring going on anytime soon (if you know of anything about this please post here).
A very popular VM's heap size won't even reach 1GB. (It will allow you to set the parameter but will shamelessly ignore it if it is above a certain limit). It seems that VMs like that one are targeted only at feeble client code.
I believe that projects using Prevayler will actually raise the bar for VM robustness, heap size and garbage collection performance.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment