API Documentation

class lsm.LSM

Python wrapper for SQLite4’s LSM implementation.

http://www.sqlite.org/src4/doc/trunk/www/lsmapi.wiki

__contains__

Return a boolean indicating whether the given key exists.

__delitem__

Dictionary API wrapper for the delete() and delete_range() methods.

Parameters:key – Either a string or a slice. Additionally, a second parameter can be supplied indicating what seek method to use.

Note

When deleting a range of keys, the start and end keys themselves are not deleted, only the intervening keys.

__enter__()

Use the database as a context manager. The database will be closed when the wrapped block exits.

__getitem__

Dictionary API wrapper for the fetch() and fetch_range() methods.

Parameters:key – Either a string or a slice. Additionally, a second parameter can be supplied indicating what seek method to use.

Examples using single keys:

  • ['charlie'], search for the key charlie.
  • ['2014.XXX', SEEK_LE], return the key whose value is equal to 2014.XXX. If no such key exists, return the lowest key that does not exceed 2014.XXX. If there is no lower key, then a KeyError will be raised.
  • ['2014.XXX', SEEK_GE], return the key whose value is equal to 2014.XXX. If no such key exists, return the greatest key that does not precede 2014.XXX. If there is no higher key, then a KeyError will be raised.

Examples using slices (SEEK_LE and SEEK_GE cannot be used with slices):

  • ['a':'z'], return all keys from a to z in ascending order.
  • ['z':'a'], return all keys from z to a in reverse order.
  • ['a':], return all key/value pairs from a on up.
  • [:'z'], return all key/value pairs up to and including z.
  • ['a'::True], return all key/value pairs from a on up in reverse order.

Note

When fetching slices, a KeyError will not be raised under any circumstances.

__init__
Parameters:
  • filename (str) – Path to database file.
  • open_database (bool) – Whether to open the database automatically when the class is instantiated.
  • page_size (int) – Page size in bytes. Default is 4096.
  • block_size (int) – Block size in kb. Default is 1024 (1MB).
  • safety_level (int) – Safety level in face of crash.
  • readonly (bool) – Open database for read-only access.
  • enable_multiple_processes (bool) – Allow multiple processes to access the database. Default is True.
__iter__

Efficiently iterate through the items in the database. This method yields successive key/value pairs.

Note

The return value is a generator.

__reversed__()

Efficiently iterate through the items in the database in reverse order. This method yields successive key/value pairs.

__setitem__

Dictionary API wrapper for the insert() method.

autoflush()

This value determines the amount of data allowed to accumulate in a live in-memory tree before it is marked as old. After committing a transaction, a connection checks if the size of the live in-memory tree, including data structure overhead, is greater than the value of this option in KB. If it is, and there is not already an old in-memory tree, the live in-memory tree is marked as old.

The maximum allowable value is 1048576 (1GB). There is no minimum value. If this parameter is set to zero, then an attempt is made to mark the live in-memory tree as old after each transaction is committed.

The default value is 1024 (1MB).

begin()

Begin a transaction. Transactions can be nested.

Note

In most cases it is preferable to use the transaction() context manager/decorator.

checkpoint()

Write to the database file header.

checkpoint_size()

The number of KB written to the database file since the most recent checkpoint.

close()

Close the database. If the database was already closed, this will return False, otherwise returns True on success.

Warning

You must close all cursors before attempting to close the db, otherwise an IOError will be raised.

commit()

Commit the inner-most transaction.

Returns:Boolean indicating whether the changes were commited.
config_mmap()

If this value is set to 0, then the database file is accessed using ordinary read/write IO functions. Or, if it is set to 1, then the database file is memory mapped and accessed that way. If this parameter is set to any value N greater than 1, then up to the first N KB of the file are memory mapped, and any remainder accessed using read/write IO.

cursor()

Create a cursor and return it as a context manager. After the wrapped block, the cursor is closed.

Parameters:reverse (bool) – Whether the cursor will iterate over keys in descending order.

Example:

with lsm_db.cursor() as cursor:
    for key, value in cursor.fetch_range('a', 'z'):
        # do something with data...

with lsm_db.cursor(reverse=True) as cursor:
    for key, value in cursor.fetch_range('z', 'a'):
        # data is now ordered descending order.

Note

In general the cursor() context manager should be used as it ensures cursors are properly cleaned up when you are done using them.

LSM databases cannot be closed as long as there are any open cursors, so it is very important to close them when finished.

delete()

Remove the specified key and value from the database. If the key does not exist, no exception is raised.

Note

You can delete keys using Python’s dictionary API:

# These are equivalent:
lsm_db.delete('some-key')
del lsm_db['some-key']
delete_range()

Delete a range of keys, though the start and end keys themselves are not deleted.

Parameters:
  • start (str) – Beginning of range. This key is not removed.
  • end (str) – End of range. This key is not removed.

Rather than using delete_range(), you can use Python’s del keyword, specifying a slice of keys.

Example:

>>> for key in 'abcdef':
...     db[key] = key.upper()

>>> del db['a':'c']  # This will only delete 'b'.
>>> 'a' in db, 'b' in db, 'c' in db
(True, False, True)

>>> del db['0':'d']
>>> print list(db)
[('d', 'D'), ('e', 'E'), ('f', 'F')]
fetch()

Retrieve a value from the database.

Parameters:
  • key (str) – The key to retrieve.
  • seek_method (int) – Instruct the database how to match the key.
Raises:

KeyError if a matching key cannot be found. See below for more details.

The following seek methods can be specified.

  • SEEK_EQ (default): match key based on equality. If no match is

    found, then a KeyError is raised.

  • SEEK_LE: if the key does not exist, return the largest key in the

    database that is smaller than the given key. If no smaller key exists, then a KeyError will be raised.

  • SEEK_GE: if the key does not exist, return the smallest key in

    the database that is larger than the given key. If no larger key exists, then a KeyError will be raised.

Note

Instead of calling fetch(), you can simply treat your database like a dictionary.

Example:

# These are equivalent:
val = lsm_db.fetch('key')
val = lsm_db['key']

# You can specify the `seek_method` by passing a tuple:
val = lsm_db.fetch('other-key', SEEK_LE)
val = lsm_db['other-key', SEEK_LE]
fetch_range()

Fetch a range of keys, inclusive of both the start and end keys. If the start key is not specified, then the first key in the database will be used. If the end key is not specified, then all succeeding keys will be fetched.

If the start key is less than the end key, then the keys will be returned in ascending order. The logic for selecting the first and last key in the event either key is missing is such that:

  • The start key will be the smallest key in the database that is larger than the given key (same as SEEK_GE).
  • The end key will be the largest key in the database that is smaller than the given key (same as SEEK_LE).

If the start key is greater than the end key, then the keys will be returned in descending order. The logic for selecting the first and last key in the event either key is missing is such that:

  • The start key will be the largest key in the database that is smaller than the given key (same as SEEK_LE).
  • The end key will be the smallest key in the database that is larger than the given key (same as SEEK_GE).

Note

If one or both keys is None and you wish to fetch in reverse, you need to specify a third parameter, reverse=True.

Note

Rather than using fetch_range(), you can use the __setitem__() API and pass in a slice. The examples below will use the slice API.

Say we have the following data:

db.update({
  'a': 'A',
  'c': 'C',
  'd': 'D',
  'f': 'F',
})

Here are some example calls using ascending order:

>>> db['a':'d']
[('a', 'A'), ('c', 'C'), ('d', 'D')]

>>> db['a':'e']
[('a', 'A'), ('c', 'C'), ('d', 'D')]

>>> db['b':'e']
[('c', 'C'), ('d', 'D')]

If one of the boundaries is not specified (None), then it will start at the lowest or highest key, respectively.

>>> db[:'ccc']
[('a', 'A'), ('c', 'C')]

>>> db['ccc':]
[('d', 'D'), ('f', 'F')]

>>> db[:'x']
[('a', 'A'), ('c', 'C'), ('d', 'D'), ('f', 'F')]

If the start key is higher than the highest key, no results are returned.

>>> db['x':]
[]

If the end key is lower than the lowest key, no results are returned.

>>> db[:'0']
[]

Note

If the start key is greater than the end key, lsm-python will assume you want the range in reverse order.

Examples in descending (reverse) order:

>>> db['d':'a']
[('d', 'D'), ('c', 'C'), ('a', 'A')]

>>> db['e':'a']
[('d', 'D'), ('c', 'C'), ('a', 'A')]

>>> db['e':'b']
[('d', 'D'), ('c', 'C')]

If one of the boundaries is not specified (None), then it will start at the highest and lowest keys, respectively.

>>> db['ccc'::True]
[('c', 'C'), ('a', 'A')]

>>> db[:'ccc':True]
[('f', 'F'), ('d', 'D')]

>>> db['x'::True]
[('f', 'F'), ('d', 'D'), ('c', 'C'), ('a', 'A')]

If the end key is higher than the highest key, no results are returned.

>>> db[:'x':True]
[]

If the start key is lower than the lowest key, no results are returned.

>>> db['0'::True]
[]
flush()

Flush the in-memory tree to disk, creating a new segment.

insert()

Insert a key/value pair to the database. If the key exists, the previous value will be overwritten.

Note

Rather than calling insert(), you can simply treat your database as a dictionary and use the __setitem__() API:

# These are equivalent:
lsm_db.insert('key', 'value')
lsm_db['key'] = 'value'
keys()

Return a generator that successively yields the keys in the database.

Parameters:reverse (bool) – Return the keys in reverse order.
Return type:generator
open()

Open the database. If the database was already open, this will return False, otherwise returns True on success.

pages_read()

The number of 4KB pages read from the database file during the lifetime of this connection.

pages_written()

The number of 4KB pages written to the database file during the lifetime of this connection.

rollback()

Rollback the inner-most transaction. If keep_transaction is True, then the transaction will remain open after the changes were rolled back.

Parameters:keep_transaction (bool) – Whether the transaction will remain open after the changes are rolled back (default=True).
Returns:Boolean indicating whether the changes were rolled back.
set_auto_checkpoint()

If this option is set to non-zero value N, then a checkpoint is automatically attempted after each N KB of data have been written to the database file.

The amount of uncheckpointed data already written to the database file is a global parameter. After performing database work (writing to the database file), the process checks if the total amount of uncheckpointed data exceeds the value of this paramter. If so, a checkpoint is performed. This means that this option may cause the connection to perform a checkpoint even if the current connection has itself written very little data into the database file.

The default value is 2048 (checkpoint every 2MB).

set_automerge()

The minimum number of segments to merge together at a time.

The default value is 4.

set_block_size()

Must be set to a power of two between 64 and 65536, inclusive (block sizes between 64KB and 64MB).

If the connection creates a new database, the block size of the new database is set to the value of this option in KB. After lsm_open() has been called, querying this parameter returns the actual block size of the opened database.

The default value is 1024 (1MB blocks).

Warning

This may only be set before calling lsm_open(). The best way to set this value is when you instantiate your LSM object.

If the database is already open, then a ValueError will be raised.

set_multiple_processes()

If true, the library uses shared-memory and posix advisory locks to co-ordinate access by clients from within multiple processes. Otherwise, if false, all database clients must be located in the same process.

The default value is True.

Warning

This may only be set before calling lsm_open(). The best way to set this value is when you instantiate your LSM object.

If the database is already open, then a ValueError will be raised.

set_page_size()

Set the page size (in bytes). Default value is 4096 bytes.

Warning

This may only be set before calling lsm_open(). The best way to set this value is when you instantiate your LSM object.

If the database is already open, then a ValueError will be raised.

set_readonly()

This parameter may only be set before lsm_open() is called.

set_safety()

Valid values are 0, 1 (the default) and 2. This parameter determines how robust the database is in the face of a system crash (e.g. a power failure or operating system crash). As follows:

  • 0 (off): No robustness. A system crash may corrupt the database.

  • 1 (normal): Some robustness. A system crash may not corrupt the

    database file, but recently committed transactions may be lost following recovery.

  • 2 (full): Full robustness. A system crash may not corrupt the

    database file. Following recovery the database file contains all successfully committed transactions.

Values:

  • SAFETY_OFF
  • SAFETY_NORMAL
  • SAFETY_FULL
transaction()

Create a context manager that runs the wrapped block in a transaction.

Example:

with lsm_db.transaction() as txn:
    lsm_db['k1'] = 'v1'

with lsm_db.transaction() as txn:
    lsm_db['k1'] = 'v1-1'
    txn.rollback()

assert lsm_db['k1'] == 'v1'

You can also use the transaction() method as a decorator. If the wrapped function returns normally, the transaction is committed, otherwise it is rolled back.

@lsm_db.transaction()
def transfer_funds(from_account, to_account, amount):
    # transfer money...
    return
update()

Add an arbitrary number of key/value pairs. Unlike the Python dict.update method, update() does not accept arbitrary keyword arguments and only takes a single dictionary as the parameter.

Parameters:values (dict) – A dictionary of key/value pairs.
values()

Return a generator that successively yields the values in the database. The values are ordered based on their key.

Parameters:reverse (bool) – Return the values in reverse key-order.
Return type:generator
work()

Explicitly perform work on the database structure.

If the database has an old in-memory tree when work() is called, it is flushed to disk. If this means that more than nkb of data is written to the database file, no further work is performed. Otherwise, the number of KB written is subtracted from nKB before proceeding.

Typically you will use 1 for the parameter in order to optimize the database.

Parameters:nkb (int) – Limit on the number of KB of data that should be written to the database file before the call returns. It is a hint and is not honored strictly.
Returns:The number of KB written to the database file.

Note

A background thread or process is ideal for running this method.

class lsm.Cursor

Wrapper around the lsm_cursor object.

Functions seek(), first(), and last() are seek functions. Whether or not next() and previous() may be called successfully depends on the most recent seek function called on the cursor. Specifically,

  • At least one seek function must have been called on the cursor.
  • To call next(), the most recent call to a seek function must have been either first() or a call to seek() specifying SEEK_GE.
  • To call previous(), the most recent call to a seek function must have been either last() or a call to seek() specifying SEEK_LE.

Otherwise, if the above conditions are not met when next() or previous() is called, LSM_MISUSE is returned and the cursor position remains unchanged.

For more information, see:

http://www.sqlite.org/src4/doc/trunk/www/lsmusr.wiki#reading_from_a_database

__enter__()

Expose the cursor as a context manager. After the wrapped block, the cursor will be closed, which is very important.

__iter__

Iterate from the cursor’s current position. The iterator returns successive key/value pairs.

close()

Close the cursor.

Note

If you are using the cursor as a context manager, then it is not necessary to call this method.

Warning

If a cursor is not closed, then the database cannot be closed properly.

compare()

Compare the given key with key at the cursor’s current position.

fetch_range()

Fetch a range of keys, inclusive of both the start and end keys. If the start key is not specified, then the first key will be used. If the end key is not specified, then all succeeding keys will be fetched.

For complete details, see the docstring for LSM.fetch_range().

fetch_until()

This method returns a generator that yields key/value pairs obtained by iterating from the cursor’s current position until it reaches the given key.

first()

Jump to the first key in the database.

is_valid()

Return a boolean indicating whether the cursor is pointing at a valid record.

key()

Return the key at the cursor’s current position.

keys()

Return a generator that successively yields keys.

last()

Jump to the last key in the database.

next
open()

Open the cursor. In general this method does not need to be called by applications, as it is called automatically when a Cursor is instantiated.

previous()

Move the cursor to the previous record. If no previous record exists, then a StopIteration will be raised.

If you encounter an Exception indicating Misuse (21) when calling this method, then you need to be sure that you are either calling last() or seek() with a seek method of SEEK_LE.

seek()

Seek to the given key using the specified matching method. If the operation did not find a valid key, then a KeyError will be raised.

  • SEEK_EQ (default): match key based on equality. If no match is

    found then a KeyError is raised.

  • SEEK_LE: if the key does not exist, return the largest key in the

    database that is smaller than the given key. If no smaller key exists, then a KeyError will be raised.

  • SEEK_GE: if the key does not exist, return the smallest key in

    the database that is larger than the given key. If no larger key exists, then a KeyError will be raised.

For more details, read:

http://www.sqlite.org/src4/doc/trunk/www/lsmapi.wiki#lsm_csr_seek

value()

Return the value at the cursor’s current position.

values()

Return a generator that successively yields values.

class lsm.Transaction

Context manager and decorator to run the wrapped block in a transaction. LSM supports nested transactions, so the context manager/decorator can be mixed and matched and nested arbitrarily.

Rather than instantiating this class directly, use LSM.transaction().

Example:

with lsm_db.transaction() as txn:
    lsm_db['k1'] = 'v1'

with lsm_db.transaction() as txn:
    lsm_db['k1'] = 'v1-1'
    txn.rollback()

assert lsm_db['k1'] == 'v1'
commit()

Commit the transaction and optionally open a new transaction. This is especially useful for context managers, where you may commit midway through a wrapped block of code, but want to retain transactional behavior for the rest of the block.

rollback()

Rollback the transaction and optionally retain the open transaction. This is especially useful for context managers, where you may rollback midway through a wrapped block of code, but want to retain the transactional behavior for the rest of the block.

Constants

Seek methods, can be used when fetching records or slices.

SEEK_EQ
The cursor is left at EOF (invalidated). A call to lsm_csr_valid() returns non-zero.
SEEK_LE
The cursor is left pointing to the largest key in the database that is smaller than (pKey/nKey). If the database contains no keys smaller than (pKey/nKey), the cursor is left at EOF.
SEEK_GE
The cursor is left pointing to the smallest key in the database that is larger than (pKey/nKey). If the database contains no keys larger than (pKey/nKey), the cursor is left at EOF.

If the fourth parameter is SEEK_LEFAST, this function searches the database in a similar manner to SEEK_LE, with two differences:

Even if a key can be found (the cursor is not left at EOF), the lsm_csr_value() function may not be used (attempts to do so return LSM_MISUSE).

The key that the cursor is left pointing to may be one that has been recently deleted from the database. In this case it is guaranteed that the returned key is larger than any key currently in the database that is less than or equal to (pKey/nKey).

SEEK_LEFAST requests are intended to be used to allocate database keys.

Used in calls to LSM.set_safety().

  • SAFETY_OFF
  • SAFETY_NORMAL
  • SAFETY_FULL