This package contains public API and introduction
JDBM intro
Key-Value databases have got a lot of attention recently, but their history is much older. GDBM (predecessor of JDBM) started
in 1970 and was called 'pre rational' database. JDBM is under development since 2000. Version 1.0 was in production
since 2005 with only a few bugs reported. Version 2.0 adds some features on top of JDBM (most importantly java.util.Map
views)
JDBM 2.0 goal is to provide simple and fast persistence. It is very simple to use, it has minimal overhead and standalone
JAR takes only 130KB. It is excelent choice for Swing application or Android phone. JDBM also handles huge datasets well
and can be used for data processing (author is using it to process astronomical data).
The source code is not complicated; it is well readabable and can also be used for teaching.
On the other hand, it does not have some important features (concurrent scalability, multiple transaction, annotations,
clustering...), which is the reason why it is so simple and small. For example, multiple transaction would introduce a new dimension of problems, such as concurrent updates, optimistic/pesimistic record locking, etc.
JDBM does not try to replicate Valdemort, HBase or other more advanced Key Value databases.
JDBM2 is
Not a SQL database
JDBM2 is more low level. With this comes great power (speed, resource usage, no ORM)
but also big responsibility. You are responsible for data integrity, partioning, typing etc...
Excelent embedded SQL database is H2 (in fact it is faster than JDBM2 in many cases).
Not an Object database
The fact that JDBM2 uses serialization may give you a false sense of security. It does not
magically split a huge object graph into smaller pieces, nor does it handle duplicates.
With JDBM you may easily end up with single instance being persisted in several copies over a datastore.
An object database would do this magic for you as it traverses object graph references and
makes sure there are no duplicates in a datastore. Have look at
NeoDatis or DB4o
Not at enterprise level
JDBM2 codebase is propably very good and without bugs, but it is a community project. You may easily endup without
support. For something more enterprisey have a look at
Berkley DB Java Edition from Oracle. BDB has more
features, it is more robust, it has better documentation, bigger overhead and comes with a pricetag.
Not distributed
Key Value databases are associated with distributed stores, map reduce etc. JDBM is not distributed, it runs on single computer only.
It does not even have a network interface and can not act as a server.
You would be propably looking for Valdemort.
JDBM2 overview
JDBM2 has some helpfull features to make it easier to use. It also brings it closer to SQL and helps with data
integrity checks and data queries.
Low level page store
This is Key-Value database in its literal mean. Key is a record identifier number (recid) which points to a location in file.
Since recid is a physical pointer, new key values must be assgned by store (wherever the free space is found).
Value can be any object, serializable to a byte[] array. Page store also provides transaction and cache.
Named objects
Number as an identifier is not very practical. So there is a table that translates Strings to recid. This is recommended
approach for persisting singletons.
Primary maps
{@link jdbm.PrimaryStoreMap}, {@link jdbm.PrimaryTreeMap} and {@link jdbm.PrimaryHashMap} implements java.util.map
interface
from Java Collections. But they use page store for persistence. So you can create HashMap with bilions of items and worry only about the commits.
Secondary maps
Secondary maps (indexes) provide side information and associations for the primary map. For example, if there is a Person class persisted in the primary map,
the secondary maps can provide fast lookup by name, address, age... The secondary maps are 'views' to the primary map and are readonly.
They are updated by the primary map automatically.
Cache
JDBM has object instance cache. This reduces the serialization time and disk IO. By default JDBM uses SoftReference cache. If JVM have
less then 50MB heap space available, MRU (Most Recently Used) fixed size cache is used instead.
Transactions
JDBM provides transactions with commit and rollback. The transaction mechanism is safe and tested (in usage for the last 5 years). JDBM allows only
single concurrent transactions and there are no problems with concurrent updates and locking.
10 things to keep in mind
- Uncommited data are stored in memory, and if you get
OutOfMemoryException
you have to make commits more
frequently.
- Keys and values are stored as part of the index nodes. They are instanciated each time the index is searched.
If you have larger values (>512 bytes), these may hurt performance and cause
OutOfMemoryException
In this case use {@link jdbm.helper.StoreReference} or {@link jdbm.PrimaryStoreMap} to store values outside of the index.
- If you run into performance problems, use the profiler rather then asking for it over the internet.
- JDBM caches returned object instances. If you modify an object (like set new name on a person),
next time RecordManager may return the object with this modification.
- Iteration over Maps is not guaranteed if there are changes
(for example adding a new entry while iterating). There is no fail fast policy yet.
So all iterations over Maps should be synchronized on RecordManager.
- More memory means better performance; use
-Xmx000m
generously. JDBM has good SoftReference cache.
- SoftReference cache may be blocking some memory for other tasks. The memory is released automatically, but it may take longer then you expect.
Consider clearing the cache manually with
RecordManager.clearCache()
before starting a new type
of task.
- It is safe not to close the db before exiting, but if you that there will be a long cleanup upon the next start.
- JDBM may have problem reclaiming free space after many records are delete/updated. You may want to run
RecordManager.defrag()
from time to time.
- A Key-Value db does not support N-M relations easily. It takes a lot of care to handle them correctly.