open source fast key-value storage library
Posted On 2011年9月28日
LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
Keys and values are arbitrary byte arrays.
Data is stored sorted by key.
Callers can provide a custom comparison function to override the sort order.
The basic operations are Put(key,value), Get(key), Delete(key).
Multiple changes can be made in one atomic batch.
Users can create a transient snapshot to get a consistent view of data.
Forward and backward iteration is supported over the data.
Data is automatically compressed using the Snappy compression library.
External activity (file system operations etc.) is relayed through a virtual interface so users can customize the operating system interactions.
Detailed documentation about how to use the library is included with the source code.
This is not a SQL database. It does not have a relational data model, it does not support SQL queries, and it has no support for indexes.
Only a single process (possibly multi-threaded) can access a particular database at a time.
There is no client-server support builtin to the library. An application that needs such support will have to wrap their own server around the library.
Here is a performance report (with explanations) from the run of the included db_bench program. The results are somewhat noisy, but should be enough to get a ballpark performance estimate.
We use a database with a million entries. Each entry has a 16 byte key, and a 100 byte value. Values used by the benchmark compress to about half their original size.
LevelDB: version 1.1
Date: Sun May 1 12:11:26 2011
CPU: 4 x Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
CPUCache: 4096 KB
Keys: 16 bytes each
Values: 100 bytes each (50 bytes after compression)
Raw Size: 110.6 MB (estimated)
File Size: 62.9 MB (estimated)
The “fill” benchmarks create a brand new database, in either sequential, or random order. The “fillsync” benchmark flushes data from the operating system to the disk after every operation; the other write operations leave the data sitting in the operating system buffer cache for a while. The “overwrite” benchmark does random writes that update existing keys in the database.
fillseq : 1.765 micros/op; 62.7 MB/s
fillsync : 268.409 micros/op; 0.4 MB/s (10000 ops)
fillrandom : 2.460 micros/op; 45.0 MB/s
overwrite : 2.380 micros/op; 46.5 MB/s
Each “op” above corresponds to a write of a single key/value pair. I.e., a random write benchmark goes at approximately 400,000 writes per second.
Each “fillsync” operation costs much less (0.3 millisecond) than a disk seek (typically 10 milliseconds). We suspect that this is because the hard disk itself is buffering the update in its memory and responding before the data has been written to the platter. This may or may not be safe based on whether or not the hard disk has enough power to save its memory in the event of a power failure.