API Documentation¶
-
class
lsm.
LSM
¶ Python wrapper for SQLite4’s LSM implementation.
http://www.sqlite.org/src4/doc/trunk/www/lsmapi.wiki
-
__contains__
¶ Return a boolean indicating whether the given key exists.
-
__delitem__
¶ Dictionary API wrapper for the
delete()
anddelete_range()
methods.Parameters: key – Either a string or a slice. Additionally, a second parameter can be supplied indicating what seek method to use. Note
When deleting a range of keys, the start and end keys themselves are not deleted, only the intervening keys.
-
__enter__
()¶ Use the database as a context manager. The database will be closed when the wrapped block exits.
-
__getitem__
¶ Dictionary API wrapper for the
fetch()
andfetch_range()
methods.Parameters: key – Either a string or a slice. Additionally, a second parameter can be supplied indicating what seek method to use. Examples using single keys:
['charlie']
, search for the key charlie.['2014.XXX', SEEK_LE]
, return the key whose value is equal to 2014.XXX. If no such key exists, return the lowest key that does not exceed 2014.XXX. If there is no lower key, then aKeyError
will be raised.['2014.XXX', SEEK_GE]
, return the key whose value is equal to 2014.XXX. If no such key exists, return the greatest key that does not precede 2014.XXX. If there is no higher key, then aKeyError
will be raised.
Examples using slices (SEEK_LE and SEEK_GE cannot be used with slices):
['a':'z']
, return all keys from a to z in ascending order.['z':'a']
, return all keys from z to a in reverse order.['a':]
, return all key/value pairs froma
on up.[:'z']
, return all key/value pairs up to and includingz
.['a'::True]
, return all key/value pairs froma
on up in reverse order.
Note
When fetching slices, a
KeyError
will not be raised under any circumstances.
-
__init__
¶ Parameters: - filename (str) – Path to database file.
- open_database (bool) – Whether to open the database automatically when the class is instantiated.
- page_size (int) – Page size in bytes. Default is 4096.
- block_size (int) – Block size in kb. Default is 1024 (1MB).
- safety_level (int) – Safety level in face of crash.
- readonly (bool) – Open database for read-only access.
- enable_multiple_processes (bool) – Allow multiple processes to access the database. Default is True.
-
__iter__
¶ Efficiently iterate through the items in the database. This method yields successive key/value pairs.
Note
The return value is a generator.
-
__reversed__
()¶ Efficiently iterate through the items in the database in reverse order. This method yields successive key/value pairs.
-
autoflush
()¶ This value determines the amount of data allowed to accumulate in a live in-memory tree before it is marked as old. After committing a transaction, a connection checks if the size of the live in-memory tree, including data structure overhead, is greater than the value of this option in KB. If it is, and there is not already an old in-memory tree, the live in-memory tree is marked as old.
The maximum allowable value is 1048576 (1GB). There is no minimum value. If this parameter is set to zero, then an attempt is made to mark the live in-memory tree as old after each transaction is committed.
The default value is 1024 (1MB).
-
begin
()¶ Begin a transaction. Transactions can be nested.
Note
In most cases it is preferable to use the
transaction()
context manager/decorator.
-
checkpoint
()¶ Write to the database file header.
-
checkpoint_size
()¶ The number of KB written to the database file since the most recent checkpoint.
-
close
()¶ Close the database. If the database was already closed, this will return False, otherwise returns True on success.
Warning
You must close all cursors before attempting to close the db, otherwise an
IOError
will be raised.
-
commit
()¶ Commit the inner-most transaction.
Returns: Boolean indicating whether the changes were commited.
-
config_mmap
()¶ If this value is set to 0, then the database file is accessed using ordinary read/write IO functions. Or, if it is set to 1, then the database file is memory mapped and accessed that way. If this parameter is set to any value N greater than 1, then up to the first N KB of the file are memory mapped, and any remainder accessed using read/write IO.
-
cursor
()¶ Create a cursor and return it as a context manager. After the wrapped block, the cursor is closed.
Parameters: reverse (bool) – Whether the cursor will iterate over keys in descending order. Example:
with lsm_db.cursor() as cursor: for key, value in cursor.fetch_range('a', 'z'): # do something with data... with lsm_db.cursor(reverse=True) as cursor: for key, value in cursor.fetch_range('z', 'a'): # data is now ordered descending order.
Note
In general the
cursor()
context manager should be used as it ensures cursors are properly cleaned up when you are done using them.LSM databases cannot be closed as long as there are any open cursors, so it is very important to close them when finished.
-
delete
()¶ Remove the specified key and value from the database. If the key does not exist, no exception is raised.
Note
You can delete keys using Python’s dictionary API:
# These are equivalent: lsm_db.delete('some-key') del lsm_db['some-key']
-
delete_range
()¶ Delete a range of keys, though the start and end keys themselves are not deleted.
Parameters: - start (str) – Beginning of range. This key is not removed.
- end (str) – End of range. This key is not removed.
Rather than using
delete_range()
, you can use Python’sdel
keyword, specifying a slice of keys.Example:
>>> for key in 'abcdef': ... db[key] = key.upper() >>> del db['a':'c'] # This will only delete 'b'. >>> 'a' in db, 'b' in db, 'c' in db (True, False, True) >>> del db['0':'d'] >>> print list(db) [('d', 'D'), ('e', 'E'), ('f', 'F')]
-
fetch
()¶ Retrieve a value from the database.
Parameters: - key (str) – The key to retrieve.
- seek_method (int) – Instruct the database how to match the key.
Raises: KeyError
if a matching key cannot be found. See below for more details.The following seek methods can be specified.
SEEK_EQ
(default): match key based on equality. If no match isfound, then a
KeyError
is raised.
SEEK_LE
: if the key does not exist, return the largest key in thedatabase that is smaller than the given key. If no smaller key exists, then a
KeyError
will be raised.
SEEK_GE
: if the key does not exist, return the smallest key inthe database that is larger than the given key. If no larger key exists, then a
KeyError
will be raised.
Note
Instead of calling
fetch()
, you can simply treat your database like a dictionary.Example:
# These are equivalent: val = lsm_db.fetch('key') val = lsm_db['key'] # You can specify the `seek_method` by passing a tuple: val = lsm_db.fetch('other-key', SEEK_LE) val = lsm_db['other-key', SEEK_LE]
-
fetch_range
()¶ Fetch a range of keys, inclusive of both the start and end keys. If the start key is not specified, then the first key in the database will be used. If the end key is not specified, then all succeeding keys will be fetched.
If the start key is less than the end key, then the keys will be returned in ascending order. The logic for selecting the first and last key in the event either key is missing is such that:
- The start key will be the smallest key in the database that is
larger than the given key (same as
SEEK_GE
). - The end key will be the largest key in the database that is smaller
than the given key (same as
SEEK_LE
).
If the start key is greater than the end key, then the keys will be returned in descending order. The logic for selecting the first and last key in the event either key is missing is such that:
- The start key will be the largest key in the database that is
smaller than the given key (same as
SEEK_LE
). - The end key will be the smallest key in the database that is larger
than the given key (same as
SEEK_GE
).
Note
If one or both keys is
None
and you wish to fetch in reverse, you need to specify a third parameter,reverse=True
.Note
Rather than using
fetch_range()
, you can use the__setitem__()
API and pass in a slice. The examples below will use the slice API.Say we have the following data:
db.update({ 'a': 'A', 'c': 'C', 'd': 'D', 'f': 'F', })
Here are some example calls using ascending order:
>>> db['a':'d'] [('a', 'A'), ('c', 'C'), ('d', 'D')] >>> db['a':'e'] [('a', 'A'), ('c', 'C'), ('d', 'D')] >>> db['b':'e'] [('c', 'C'), ('d', 'D')]
If one of the boundaries is not specified (
None
), then it will start at the lowest or highest key, respectively.>>> db[:'ccc'] [('a', 'A'), ('c', 'C')] >>> db['ccc':] [('d', 'D'), ('f', 'F')] >>> db[:'x'] [('a', 'A'), ('c', 'C'), ('d', 'D'), ('f', 'F')]
If the start key is higher than the highest key, no results are returned.
>>> db['x':] []
If the end key is lower than the lowest key, no results are returned.
>>> db[:'0'] []
Note
If the start key is greater than the end key, lsm-python will assume you want the range in reverse order.
Examples in descending (reverse) order:
>>> db['d':'a'] [('d', 'D'), ('c', 'C'), ('a', 'A')] >>> db['e':'a'] [('d', 'D'), ('c', 'C'), ('a', 'A')] >>> db['e':'b'] [('d', 'D'), ('c', 'C')]
If one of the boundaries is not specified (
None
), then it will start at the highest and lowest keys, respectively.>>> db['ccc'::True] [('c', 'C'), ('a', 'A')] >>> db[:'ccc':True] [('f', 'F'), ('d', 'D')] >>> db['x'::True] [('f', 'F'), ('d', 'D'), ('c', 'C'), ('a', 'A')]
If the end key is higher than the highest key, no results are returned.
>>> db[:'x':True] []
If the start key is lower than the lowest key, no results are returned.
>>> db['0'::True] []
- The start key will be the smallest key in the database that is
larger than the given key (same as
-
flush
()¶ Flush the in-memory tree to disk, creating a new segment.
-
insert
()¶ Insert a key/value pair to the database. If the key exists, the previous value will be overwritten.
Note
Rather than calling
insert()
, you can simply treat your database as a dictionary and use the__setitem__()
API:# These are equivalent: lsm_db.insert('key', 'value') lsm_db['key'] = 'value'
-
keys
()¶ Return a generator that successively yields the keys in the database.
Parameters: reverse (bool) – Return the keys in reverse order. Return type: generator
-
open
()¶ Open the database. If the database was already open, this will return False, otherwise returns True on success.
-
pages_read
()¶ The number of 4KB pages read from the database file during the lifetime of this connection.
-
pages_written
()¶ The number of 4KB pages written to the database file during the lifetime of this connection.
-
rollback
()¶ Rollback the inner-most transaction. If keep_transaction is True, then the transaction will remain open after the changes were rolled back.
Parameters: keep_transaction (bool) – Whether the transaction will remain open after the changes are rolled back (default=True). Returns: Boolean indicating whether the changes were rolled back.
-
set_auto_checkpoint
()¶ If this option is set to non-zero value N, then a checkpoint is automatically attempted after each N KB of data have been written to the database file.
The amount of uncheckpointed data already written to the database file is a global parameter. After performing database work (writing to the database file), the process checks if the total amount of uncheckpointed data exceeds the value of this paramter. If so, a checkpoint is performed. This means that this option may cause the connection to perform a checkpoint even if the current connection has itself written very little data into the database file.
The default value is 2048 (checkpoint every 2MB).
-
set_automerge
()¶ The minimum number of segments to merge together at a time.
The default value is 4.
-
set_block_size
()¶ Must be set to a power of two between 64 and 65536, inclusive (block sizes between 64KB and 64MB).
If the connection creates a new database, the block size of the new database is set to the value of this option in KB. After lsm_open() has been called, querying this parameter returns the actual block size of the opened database.
The default value is 1024 (1MB blocks).
Warning
This may only be set before calling
lsm_open()
. The best way to set this value is when you instantiate yourLSM
object.If the database is already open, then a
ValueError
will be raised.
-
set_multiple_processes
()¶ If true, the library uses shared-memory and posix advisory locks to co-ordinate access by clients from within multiple processes. Otherwise, if false, all database clients must be located in the same process.
The default value is
True
.Warning
This may only be set before calling
lsm_open()
. The best way to set this value is when you instantiate yourLSM
object.If the database is already open, then a
ValueError
will be raised.
-
set_page_size
()¶ Set the page size (in bytes). Default value is 4096 bytes.
Warning
This may only be set before calling
lsm_open()
. The best way to set this value is when you instantiate yourLSM
object.If the database is already open, then a
ValueError
will be raised.
-
set_readonly
()¶ This parameter may only be set before lsm_open() is called.
-
set_safety
()¶ Valid values are 0, 1 (the default) and 2. This parameter determines how robust the database is in the face of a system crash (e.g. a power failure or operating system crash). As follows:
0 (off): No robustness. A system crash may corrupt the database.
- 1 (normal): Some robustness. A system crash may not corrupt the
database file, but recently committed transactions may be lost following recovery.
- 2 (full): Full robustness. A system crash may not corrupt the
database file. Following recovery the database file contains all successfully committed transactions.
Values:
SAFETY_OFF
SAFETY_NORMAL
SAFETY_FULL
-
transaction
()¶ Create a context manager that runs the wrapped block in a transaction.
Example:
with lsm_db.transaction() as txn: lsm_db['k1'] = 'v1' with lsm_db.transaction() as txn: lsm_db['k1'] = 'v1-1' txn.rollback() assert lsm_db['k1'] == 'v1'
You can also use the
transaction()
method as a decorator. If the wrapped function returns normally, the transaction is committed, otherwise it is rolled back.@lsm_db.transaction() def transfer_funds(from_account, to_account, amount): # transfer money... return
-
update
()¶ Add an arbitrary number of key/value pairs. Unlike the Python
dict.update
method,update()
does not accept arbitrary keyword arguments and only takes a single dictionary as the parameter.Parameters: values (dict) – A dictionary of key/value pairs.
-
values
()¶ Return a generator that successively yields the values in the database. The values are ordered based on their key.
Parameters: reverse (bool) – Return the values in reverse key-order. Return type: generator
-
work
()¶ Explicitly perform work on the database structure.
If the database has an old in-memory tree when
work()
is called, it is flushed to disk. If this means that more thannkb
of data is written to the database file, no further work is performed. Otherwise, the number of KB written is subtracted from nKB before proceeding.Typically you will use
1
for the parameter in order to optimize the database.Parameters: nkb (int) – Limit on the number of KB of data that should be written to the database file before the call returns. It is a hint and is not honored strictly. Returns: The number of KB written to the database file. Note
A background thread or process is ideal for running this method.
-
-
class
lsm.
Cursor
¶ Wrapper around the lsm_cursor object.
Functions
seek()
,first()
, andlast()
are seek functions. Whether or notnext()
andprevious()
may be called successfully depends on the most recent seek function called on the cursor. Specifically,- At least one seek function must have been called on the cursor.
- To call
next()
, the most recent call to a seek function must have been eitherfirst()
or a call toseek()
specifyingSEEK_GE
. - To call
previous()
, the most recent call to a seek function must have been eitherlast()
or a call toseek()
specifyingSEEK_LE
.
Otherwise, if the above conditions are not met when
next()
orprevious()
is called,LSM_MISUSE
is returned and the cursor position remains unchanged.For more information, see:
http://www.sqlite.org/src4/doc/trunk/www/lsmusr.wiki#reading_from_a_database
-
__enter__
()¶ Expose the cursor as a context manager. After the wrapped block, the cursor will be closed, which is very important.
-
__iter__
¶ Iterate from the cursor’s current position. The iterator returns successive key/value pairs.
-
close
()¶ Close the cursor.
Note
If you are using the cursor as a context manager, then it is not necessary to call this method.
Warning
If a cursor is not closed, then the database cannot be closed properly.
-
compare
()¶ Compare the given key with key at the cursor’s current position.
-
fetch_range
()¶ Fetch a range of keys, inclusive of both the start and end keys. If the start key is not specified, then the first key will be used. If the end key is not specified, then all succeeding keys will be fetched.
For complete details, see the docstring for
LSM.fetch_range()
.
-
fetch_until
()¶ This method returns a generator that yields key/value pairs obtained by iterating from the cursor’s current position until it reaches the given
key
.
-
first
()¶ Jump to the first key in the database.
-
is_valid
()¶ Return a boolean indicating whether the cursor is pointing at a valid record.
-
key
()¶ Return the key at the cursor’s current position.
-
keys
()¶ Return a generator that successively yields keys.
-
last
()¶ Jump to the last key in the database.
-
next
¶
-
open
()¶ Open the cursor. In general this method does not need to be called by applications, as it is called automatically when a
Cursor
is instantiated.
-
previous
()¶ Move the cursor to the previous record. If no previous record exists, then a
StopIteration
will be raised.If you encounter an Exception indicating Misuse (21) when calling this method, then you need to be sure that you are either calling
last()
orseek()
with a seek method ofSEEK_LE
.
-
seek
()¶ Seek to the given key using the specified matching method. If the operation did not find a valid key, then a
KeyError
will be raised.SEEK_EQ
(default): match key based on equality. If no match isfound then a
KeyError
is raised.
SEEK_LE
: if the key does not exist, return the largest key in thedatabase that is smaller than the given key. If no smaller key exists, then a
KeyError
will be raised.
SEEK_GE
: if the key does not exist, return the smallest key inthe database that is larger than the given key. If no larger key exists, then a
KeyError
will be raised.
For more details, read:
http://www.sqlite.org/src4/doc/trunk/www/lsmapi.wiki#lsm_csr_seek
-
value
()¶ Return the value at the cursor’s current position.
-
values
()¶ Return a generator that successively yields values.
-
class
lsm.
Transaction
¶ Context manager and decorator to run the wrapped block in a transaction. LSM supports nested transactions, so the context manager/decorator can be mixed and matched and nested arbitrarily.
Rather than instantiating this class directly, use
LSM.transaction()
.Example:
with lsm_db.transaction() as txn: lsm_db['k1'] = 'v1' with lsm_db.transaction() as txn: lsm_db['k1'] = 'v1-1' txn.rollback() assert lsm_db['k1'] == 'v1'
-
commit
()¶ Commit the transaction and optionally open a new transaction. This is especially useful for context managers, where you may commit midway through a wrapped block of code, but want to retain transactional behavior for the rest of the block.
-
rollback
()¶ Rollback the transaction and optionally retain the open transaction. This is especially useful for context managers, where you may rollback midway through a wrapped block of code, but want to retain the transactional behavior for the rest of the block.
-
Constants¶
Seek methods, can be used when fetching records or slices.
SEEK_EQ
- The cursor is left at EOF (invalidated). A call to lsm_csr_valid() returns non-zero.
SEEK_LE
- The cursor is left pointing to the largest key in the database that is smaller than (pKey/nKey). If the database contains no keys smaller than (pKey/nKey), the cursor is left at EOF.
SEEK_GE
- The cursor is left pointing to the smallest key in the database that is larger than (pKey/nKey). If the database contains no keys larger than (pKey/nKey), the cursor is left at EOF.
If the fourth parameter is SEEK_LEFAST
, this function searches the
database in a similar manner to SEEK_LE
, with two differences:
Even if a key can be found (the cursor is not left at EOF), the lsm_csr_value() function may not be used (attempts to do so return LSM_MISUSE).
The key that the cursor is left pointing to may be one that has been recently deleted from the database. In this case it is guaranteed that the returned key is larger than any key currently in the database that is less than or equal to (pKey/nKey).
SEEK_LEFAST
requests are intended to be used to allocate database
keys.
Used in calls to LSM.set_safety()
.
SAFETY_OFF
SAFETY_NORMAL
SAFETY_FULL