SQLAlchemy features a lot of tools and patterns to help in every area of writing applications that talk to relational databases. To achieve this, it has a lot of areas of functionality which work together to provide a cohesive package. Ultimately, just a little bit of familiarity with each concept is all that's needed to get off the ground.
That said, here's two quick links that summarize the two most prominent features of SQLAlchemy:
For a comprehensive tour through all of SQLAlchemy's components, below is a "Trail Map" of the knowledge dependencies between these components indicating the order in which concepts may be learned. Concepts marked in bold indicate features that are useful on their own.
Start | | |--- Connection Pooling | | | | | |------ Connection Pool Configuration | | | | +--- Establishing a Database Engine | | | | | |--------- Database Engine Options | | +---- Describing Tables with MetaData | | |---- Creating and Dropping Database Tables | | |---- None | | | | +---- None | | | | | | | | None | | | | | | | | +----------- Advanced Data Mapping | +----- The Types System
Note:This section describes the connection pool module of SQLAlchemy, which is the smallest component of the library that can be used on its own. If you are interested in using SQLAlchemy for query construction or Object Relational Mapping, this module is automatically managed behind the scenes; you can skip ahead to Database Engines in that case.
At the base of any database helper library is a system of efficiently acquiring connections to the database. Since the establishment of a database connection is typically a somewhat expensive operation, an application needs a way to get at database connections repeatedly without incurring the full overhead each time. Particularly for server-side web applications, a connection pool is the standard way to maintain a "pool" of database connections which are used over and over again among many requests. Connection pools typically are configured to maintain a certain "size", which represents how many connections can be used simultaneously without resorting to creating more newly-established connections.
SQLAlchemy includes a pooling module that can be used completely independently of the rest of the toolset. This section describes how it can be used on its own, as well as the available options. If SQLAlchemy is being used more fully, the connection pooling described below occurs automatically. The options are still available, though, so this core feature is a good place to start.
import sqlalchemy.pool as pool import psycopg2 as psycopg psycopg = pool.manage(psycopg) # then connect normally connection = psycopg.connect(database='test', username='scott', password='tiger')
This produces a sqlalchemy.pool.DBProxy object which supports the same connect() function as the original DBAPI module. Upon connection, a thread-local connection proxy object is returned, which delegates its calls to a real DBAPI connection object. This connection object is stored persistently within a connection pool (an instance of sqlalchemy.pool.Pool) that corresponds to the exact connection arguments sent to the connect() function. The connection proxy also returns a proxied cursor object upon calling connection.cursor(). When all cursors as well as the connection proxy are de-referenced, the connection is automatically made available again by the owning pool object.
Basically, the connect() function is used in its usual way, and the pool module transparently returns thread-local pooled connections. Each distinct set of connect arguments corresponds to a brand new connection pool created; in this way, an application can maintain connections to multiple schemas and/or databases, and each unique connect argument set will be managed by a different pool object.
When proxying a DBAPI module through the pool module, options exist for how the connections should be pooled:
One level below using a DBProxy to make transparent pools is creating the pool yourself. The pool module comes with two implementations of connection pools: QueuePool and SingletonThreadPool. While QueuePool uses Queue.Queue to provide connections, SingletonThreadPool provides a single per-thread connection which SQLite requires.
Constructing your own pool involves passing a callable used to create a connection. Through this method, custom connection schemes can be made, such as a connection that automatically executes some initialization commands to start. The options from the previous section can be used as they apply to QueuePool or SingletonThreadPool.
import sqlalchemy.pool as pool import psycopg2 def getconn(): c = psycopg2.connect(username='ed', host='127.0.0.1', dbname='test') # execute an initialization function on the connection before returning c.cursor.execute("setup_encodings()") return c p = pool.QueuePool(getconn, max_overflow=10, pool_size=5, use_threadlocal=True)
import sqlalchemy.pool as pool import sqlite def getconn(): return sqlite.connect(filename='myfile.db') # SQLite connections require the SingletonThreadPool p = pool.SingletonThreadPool(getconn)
A database engine is a subclass of sqlalchemy.engine.SQLEngine
, and is the starting point for where SQLAlchemy provides a layer of abstraction on top of the various DBAPI2 database modules. It serves as an abstract factory for database-specific implementation objects as well as a layer of abstraction over the most essential tasks of a database connection, including connecting, executing queries, returning result sets, and managing transactions.
The average developer doesn't need to know anything about the interface or workings of a SQLEngine in order to use it. Simply creating one, and then specifying it when constructing tables and other SQL objects is all that's needed.
A SQLEngine is also a layer of abstraction on top of the connection pooling described in the previous section. While a DBAPI connection pool can be used explicitly alongside a SQLEngine, its not really necessary. Once you have a SQLEngine, you can retrieve pooled connections directly from its underlying connection pool via its own connection()
method. However, if you're exclusively using SQLALchemy's SQL construction objects and/or object-relational mappers, all the details of connecting are handled by those libraries automatically.
Engines exist for SQLite, Postgres, MySQL, MS-SQL, and Oracle, using the Pysqlite, Psycopg (1 or 2), MySQLDB, adodbapi or pymssql, and cx_Oracle modules (there is also experimental support for Firebird). Each engine imports its corresponding module which is required to be installed. For Postgres and Oracle, an alternate module may be specified at construction time as well.
The string based argument names for connecting are translated to the appropriate names when the connection is made; argument names include "host" or "hostname" for database host, "database", "db", or "dbname" for the database name (also is dsn for Oracle), "user" or "username" for the user, and "password", "pw", or "passwd" for the password. SQLite expects "filename" or "file" for the filename, or if None it defaults to "":memory:".
The connection arguments can be specified as a string + dictionary pair, or a single URL-encoded string, as follows:
from sqlalchemy import * # sqlite in memory sqlite_engine = create_engine('sqlite', {'filename':':memory:'}, **opts) # via URL sqlite_engine = create_engine('sqlite://', **opts) # sqlite using a file sqlite_engine = create_engine('sqlite', {'filename':'querytest.db'}, **opts) # via URL sqlite_engine = create_engine('sqlite://filename=querytest.db', **opts) # postgres postgres_engine = create_engine('postgres', {'database':'test', 'host':'127.0.0.1', 'user':'scott', 'password':'tiger'}, **opts) # via URL postgres_engine = create_engine('postgres://database=test&host=127.0.0.1&user=scott&password=tiger') # mysql mysql_engine = create_engine('mysql', { 'db':'mydb', 'user':'scott', 'passwd':'tiger', 'host':'127.0.0.1' } **opts) # oracle oracle_engine = create_engine('oracle', {'dsn':'mydsn', 'user':'scott', 'password':'tiger'}, **opts)
Note that the general form of connecting to an engine is:
# separate arguments engine = create_engine( <enginename>, {<named DBAPI arguments>}, <sqlalchemy options> ) # url engine = create_engine('<enginename>://<named DBAPI arguments>', <sqlalchemy options>)
A few useful methods off the SQLEngine are described here:
engine = create_engine('postgres://hostname=localhost&user=scott&password=tiger&database=test') # get a pooled DBAPI connection conn = engine.connection() # create/drop tables based on table metadata objects # (see the next section, Table Metadata, for info on table metadata) engine.create(mytable) engine.drop(mytable) # get the DBAPI module being used dbapi = engine.dbapi() # get the default schema name name = engine.get_default_schema_name() # execute some SQL directly, returns a ResultProxy (see the SQL Construction section for details) result = engine.execute("select * from table where col1=:col1", {'col1':'foo'}) # log a message to the engine's log stream engine.log('this is a message')
The remaining arguments to create_engine
are keyword arguments that are passed to the specific subclass of sqlalchemy.engine.SQLEngine
being used, as well as the underlying sqlalchemy.pool.Pool
instance. All of the options described in the previous section Connection Pool Configuration can be specified, as well as engine-specific options:
sqlalchemy.pool.Pool
to be used as the underlying source for connections, overriding the engine's connect arguments (pooling is described in the previous section). If None, a default Pool (QueuePool or SingletonThreadPool as appropriate) will be created using the engine's connect arguments.
Example:
from sqlalchemy import * import sqlalchemy.pool as pool import MySQLdb def getconn(): return MySQLdb.connect(user='ed', dbname='mydb') engine = create_engine('mysql', pool=pool.QueuePool(getconn, pool_size=20, max_overflow=40))
The ProxyEngine is useful for applications that need to swap engines at runtime, or to create their tables and mappers before they know what engine they will use. One use case is an application meant to be pluggable into a mix of other applications, such as a WSGI application. Well-behaved WSGI applications should be relocatable; and since that means that two versions of the same application may be running in the same process (or in the same thread at different times), WSGI applications ought not to depend on module-level or global configuration. Using the ProxyEngine allows a WSGI application to define tables and mappers in a module, but keep the specific database connection uri as an application instance or thread-local value.
The ProxyEngine is used in the same way as any other engine, with one additional method:
# define the tables and mappers from sqlalchemy import * from sqlalchemy.ext.proxy import ProxyEngine engine = ProxyEngine() users = Table('users', engine, ... ) class Users(object): pass assign_mapper(Users, users) def app(environ, start_response): # later, connect the proxy engine to a real engine via the connect() method engine.connect(environ['db_uri']) # now you have a real db connection and can select, insert, etc.
There is an instance of ProxyEngine available within the schema package as default_engine
. You can construct Table objects and not specify the engine parameter, and they will connect to this engine by default. To connect the default_engine, use the global_connect
function.
# define the tables and mappers from sqlalchemy import * # specify a table with no explicit engine users = Table('users', Column('user_id', Integer, primary_key=True), Column('user_name', String) ) # connect the global proxy engine global_connect('sqlite://filename=foo.db') # create the table in the selected database users.create()
A SQLEngine also provides an interface to the transactional capabilities of the underlying DBAPI connection object, as well as the connection object itself. Note that when using the object-relational-mapping package, described in a later section, basic transactional operation is handled for you automatically by its "Unit of Work" system; the methods described here will usually apply just to literal SQL update/delete/insert operations or those performed via the SQL construction library.
Typically, a connection is opened with autocommit=False
. So to perform SQL operations and just commit as you go, you can simply pull out a connection from the connection pool, keep it in the local scope, and call commit() on it as needed. As long as the connection remains referenced, all other SQL operations within the same thread will use this same connection, including those used by the SQL construction system as well as the object-relational mapper, both described in later sections:
conn = engine.connection() # execute SQL via the engine engine.execute("insert into mytable values ('foo', 'bar')") conn.commit() # execute SQL via the SQL construction library mytable.insert().execute(col1='bat', col2='lala') conn.commit()
There is a more automated way to do transactions, and that is to use the engine's begin()/commit() functionality. When the begin() method is called off the engine, a connection is checked out from the pool and stored in a thread-local context. That way, all subsequent SQL operations within the same thread will use that same connection. Subsequent commit() or rollback() operations are performed against that same connection. In effect, its a more automated way to perform the "commit as you go" example above.
engine.begin() engine.execute("insert into mytable values ('foo', 'bar')") mytable.insert().execute(col1='foo', col2='bar') engine.commit()
A traditional "rollback on exception" pattern looks like this:
engine.begin() try: engine.execute("insert into mytable values ('foo', 'bar')") mytable.insert().execute(col1='foo', col2='bar') except: engine.rollback() raise engine.commit()
An shortcut which is equivalent to the above is provided by the transaction
method:
def do_stuff(): engine.execute("insert into mytable values ('foo', 'bar')") mytable.insert().execute(col1='foo', col2='bar') engine.transaction(do_stuff)
An added bonus to the engine's transaction methods is "reentrant" functionality; once you call begin(), subsequent calls to begin() will increment a counter that must be decremented corresponding to each commit() statement before an actual commit can happen. This way, any number of methods that want to insure a transaction can call begin/commit, and be nested arbitrarily:
# method_a starts a transaction and calls method_b def method_a(): engine.begin() try: method_b() except: engine.rollback() raise engine.commit() # method_b starts a transaction, or joins the one already in progress, # and does some SQL def method_b(): engine.begin() try: engine.execute("insert into mytable values ('bat', 'lala')") mytable.insert().execute(col1='bat', col2='lala') except: engine.rollback() raise engine.commit() # call method_a method_a()
Above, method_a
is called first, which calls engine.begin()
. Then it calls method_b
. When method_b
calls engine.begin()
, it just increments a counter that is decremented when it calls commit()
. If either method_a
or method_b
calls rollback()
, the whole transaction is rolled back. The transaction is not committed until method_a
calls the commit()
method.
The object-relational-mapper capability of SQLAlchemy includes its own commit()
method that gathers SQL statements into a batch and runs them within one transaction. That transaction is also invokved within the scope of the "reentrant" methodology above; so multiple objectstore.commit() operations can also be bundled into a larger database transaction via the above methodology.
The core of SQLAlchemy's query and object mapping operations is table metadata, which are Python objects that describe tables. Metadata objects can be created by explicitly naming the table and all its properties, using the Table, Column, ForeignKey, and Sequence objects imported from sqlalchemy.schema
, and a database engine constructed as described in the previous section, or they can be automatically pulled from an existing database schema. First, the explicit version:
from sqlalchemy import * engine = create_engine('sqlite', {'filename':':memory:'}, **opts) users = Table('users', engine, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), Column('email_address', String(60), key='email'), Column('password', String(20), nullable = False) ) user_prefs = Table('user_prefs', engine, Column('pref_id', Integer, primary_key=True), Column('user_id', Integer, ForeignKey("users.user_id"), nullable=False), Column('pref_name', String(40), nullable=False), Column('pref_value', String(100)) )
The specific datatypes, such as Integer, String, etc. are defined in The Types System and are automatically pulled in when you import * from sqlalchemy
. Note that for Column objects, an altername name can be specified via the "key" parameter; if this parameter is given, then all programmatic references to this Column object will be based on its key, instead of its actual column name.
Once constructed, the Table object provides a clean interface to the table's properties as well as that of its columns:
employees = Table('employees', engine, Column('employee_id', Integer, primary_key=True), Column('employee_name', String(60), nullable=False, key='name'), Column('employee_dept', Integer, ForeignKey("departments.department_id")) ) # access the column "EMPLOYEE_ID": employees.columns.employee_id # or just employees.c.employee_id # via string employees.c['employee_id'] # iterate through all columns for c in employees.c: # ... # get the table's primary key columns for primary_key in employees.primary_key: # ... # get the table's foreign key objects: for fkey in employees.foreign_keys: # ... # access the table's SQLEngine object: employees.engine # access a column's name, type, nullable, primary key, foreign key employees.c.employee_id.name employees.c.employee_id.type employees.c.employee_id.nullable employees.c.employee_id.primary_key employees.c.employee_dept.foreign_key # get the "key" of a column, which defaults to its name, but can # be any user-defined string: employees.c.name.key # access a column's table: employees.c.employee_id.table is employees >>> True # get the table related by a foreign key fcolumn = employees.c.employee_dept.foreign_key.column.table
Metadata objects can also be reflected from tables that already exist in the database. Reflection means based on a table name, the names, datatypes, and attributes of all columns, including foreign keys, will be loaded automatically. This feature is supported by all database engines:
>>> messages = Table('messages', engine, autoload = True) >>> [c.name for c in messages.columns] ['message_id', 'message_name', 'date']
Note that if a reflected table has a foreign key referencing another table, then the metadata for the related table will be loaded as well, even if it has not been defined by the application:
>>> shopping_cart_items = Table('shopping_cart_items', engine, autoload = True) >>> print shopping_cart_items.c.cart_id.table.name shopping_carts
To get direct access to 'shopping_carts', simply instantiate it via the Table constructor. You'll get the same instance of the shopping cart Table as the one that is attached to shoppingcartitems:
>>> shopping_carts = Table('shopping_carts', engine) >>> shopping_carts is shopping_cart_items.c.cart_id.table.name True
This works because when the Table constructor is called for a particular name and database engine, if the table has already been created then the instance returned will be the same as the original. This is a singleton constructor:
>>> news_articles = Table('news', engine, ... Column('article_id', Integer, primary_key = True), ... Column('url', String(250), nullable = False) ... ) >>> othertable = Table('news', engine) >>> othertable is news_articles True
Creating and dropping is easy, just use the create()
and drop()
methods:
SQLAlchemy includes flexible constructs in which to create default values for columns upon the insertion of rows, as well as upon update. These defaults can take several forms: a constant, a Python callable to be pre-executed before the SQL is executed, a SQL expression or function to be pre-executed before the SQL is executed, a pre-executed Sequence (for databases that support sequences), or a "passive" default, which is a default function triggered by the database itself upon insert, the value of which can then be post-fetched by the engine, provided the row provides a primary key in which to call upon.
A basic default is most easily specified by the "default" keyword argument to Column. This defines a value, function, or SQL expression that will be pre-executed to produce the new value, before the row is inserted:
# a function to create primary key ids i = 0 def mydefault(): global i i += 1 return i t = Table("mytable", db, # function-based default Column('id', Integer, primary_key=True, default=mydefault), # a scalar default Column('key', String(10), default="default") )
The "default" keyword can also take SQL expressions, including select statements or direct function calls:
t = Table("mytable", db, Column('id', Integer, primary_key=True), # define 'create_date' to default to now() Column('create_date', DateTime, default=func.now()), # define 'key' to pull its default from the 'keyvalues' table Column('key', String(20), default=keyvalues.select(keyvalues.c.type='type1', limit=1)) )
The "default" keyword argument is shorthand for using a ColumnDefault object in a column definition. This syntax is optional, but is required for other types of defaults, futher described below:
Column('mycolumn', String(30), ColumnDefault(func.get_data()))
Similar to an on-insert default is an on-update default, which is most easily specified by the "onupdate" keyword to Column, which also can be a constant, plain Python function or SQL expression:
t = Table("mytable", db, Column('id', Integer, primary_key=True), # define 'last_updated' to be populated with current_timestamp (the ANSI-SQL version of now()) Column('last_updated', DateTime, onupdate=func.current_timestamp()), )
To use an explicit ColumnDefault object to specify an on-update, use the "for_update" keyword argument:
Column('mycolumn', String(30), ColumnDefault(func.get_data(), for_update=True))
A PassiveDefault indicates a column default or on-update value that is executed automatically by the database. This construct is used to specify a SQL function that will be specified as "DEFAULT" when creating tables, and also to indicate the presence of new data that is available to be "post-fetched" after an insert or update execution.
t = Table('test', e, Column('mycolumn', DateTime, PassiveDefault("sysdate")) )
A create call for the above table will produce:
CREATE TABLE test ( mycolumn datetime default sysdate )
PassiveDefaults also send a message to the SQLEngine that data is available after update or insert. The object-relational mapper system uses this information to post-fetch rows after insert or update, so that instances can be refreshed with the new data. Below is a simplified version:
# table with passive defaults mytable = Table('mytable', engine, Column('my_id', Integer, primary_key=True), # an on-insert database-side default Column('data1', Integer, PassiveDefault("d1_func")), # an on-update database-side default Column('data2', Integer, PassiveDefault("d2_func", for_update=True)) ) # insert a row mytable.insert().execute(name='fred') # ask the engine: were there defaults fired off on that row ? if table.engine.lastrow_has_defaults(): # postfetch the row based on primary key. # this only works for a table with primary key columns defined primary_key = table.engine.last_inserted_ids() row = table.select(table.c.id == primary_key[0])
When Tables are reflected from the database using autoload=True
, any DEFAULT values set on the columns will be reflected in the Table object as PassiveDefault instances.
Current Postgres support does not rely upon OID's to determine the identity of a row. This is because the usage of OIDs has been deprecated with Postgres and they are disabled by default for table creates as of PG version 8. Pyscopg2's "cursor.lastrowid" function only returns OIDs. Therefore, when inserting a new row which has passive defaults set on the primary key columns, the default function is still pre-executed since SQLAlchemy would otherwise have no way of retrieving the row just inserted.
A table with a sequence looks like:
table = Table("cartitems", db, Column("cart_id", Integer, Sequence('cart_id_seq'), primary_key=True), Column("description", String(40)), Column("createdate", DateTime()) )
The Sequence is used with Postgres or Oracle to indicate the name of a Sequence that will be used to create default values for a column. When a table with a Sequence on a column is created by SQLAlchemy, the Sequence object is also created. Similarly, the Sequence is dropped when the table is dropped. Sequences are typically used with primary key columns. When using Postgres, if an integer primary key column defines no explicit Sequence or other default method, SQLAlchemy will create the column with the SERIAL keyword, and will pre-execute a sequence named "tablenamecolumnnameseq" in order to retrieve new primary key values. Oracle, which has no "auto-increment" keyword, requires that a Sequence be created for a table if automatic primary key generation is desired. Note that for all databases, primary key values can always be explicitly stated within the bind parameters for any insert statement as well, removing the need for any kind of default generation function.
A Sequence object can be defined on a Table that is then used for a non-sequence-supporting database. In that case, the Sequence object is simply ignored. Note that a Sequence object is entirely optional for all databases except Oracle, as other databases offer options for auto-creating primary key values, such as AUTOINCREMENT, SERIAL, etc. SQLAlchemy will use these default methods for creating primary key values if no Sequence is present on the table metadata.
A sequence can also be specified with optional=True
which indicates the Sequence should only be used on a database that requires an explicit sequence, and not those that supply some other method of providing integer values. At the moment, it essentially means "use this sequence only with Oracle and not Postgres".
Indexes can be defined on table columns, including named indexes, non-unique or unique, multiple column. Indexes are included along with table create and drop statements. They are not used for any kind of run-time constraint checking; SQLAlchemy leaves that job to the expert on constraint checking, the database itself.
mytable = Table('mytable', engine, # define a unique index Column('col1', Integer, unique=True), # define a unique index with a specific name Column('col2', Integer, unique='mytab_idx_1'), # define a non-unique index Column('col3', Integer, index=True), # define a non-unique index with a specific name Column('col4', Integer, index='mytab_idx_2'), # pass the same name to multiple columns to add them to the same index Column('col5', Integer, index='mytab_idx_2'), Column('col6', Integer), Column('col7', Integer) ) # create the table. all the indexes will be created along with it. mytable.create() # indexes can also be specified standalone i = Index('mytab_idx_3', mytable.c.col6, mytable.c.col7, unique=False) # which can then be created separately (will also get created with table creates) i.create()
A Table object created against a specific engine can be re-created against a new engine using the toengine
method:
# create two engines sqlite_engine = create_engine('sqlite', {'filename':'querytest.db'}) postgres_engine = create_engine('postgres', {'database':'test', 'host':'127.0.0.1', 'user':'scott', 'password':'tiger'}) # load 'users' from the sqlite engine users = Table('users', sqlite_engine, autoload=True) # create the same Table object for the other engine pg_users = users.toengine(postgres_engine)
Also available is the "database neutral" ansisql engine:
import sqlalchemy.ansisql as ansisql generic_engine = ansisql.engine() users = Table('users', generic_engine, Column('user_id', Integer), Column('user_name', String(50)) )
Flexible "multi-engined" tables can also be achieved via the proxy engine, described in the section Using the Proxy Engine.
TableClause and ColumnClause are "primitive" versions of the Table and Column objects which dont use engines at all; applications that just want to generate SQL strings but not directly communicate with a database can use TableClause and ColumnClause objects (accessed via 'table' and 'column'), which are non-singleton and serve as the "lexical" base class of Table and Column:
tab1 = table('table1', column('id'), column('name')) tab2 = table('table2', column('id'), column('email')) tab1.select(tab1.c.name == 'foo')
TableClause and ColumnClause are strictly lexical. This means they are fully supported within the full range of SQL statement generation, but they don't support schema concepts like creates, drops, primary keys, defaults, nullable status, indexes, or foreign keys.
Note: This section describes how to use SQLAlchemy to construct SQL queries and receive result sets. It does not cover the object relational mapping capabilities of SQLAlchemy; that is covered later on in None. However, both areas of functionality work similarly in how selection criterion is constructed, so if you are interested just in ORM, you should probably skim through basic None construction before moving on.
Once you have used the sqlalchemy.schema
module to construct your tables and/or reflect them from the database, performing SQL queries using those table meta data objects is done via the sqlalchemy.sql
package. This package defines a large set of classes, each of which represents a particular kind of lexical construct within a SQL query; all are descendants of the common base class sqlalchemy.sql.ClauseElement
. A full query is represented via a structure of ClauseElements. A set of reasonably intuitive creation functions is provided by the sqlalchemy.sql
package to create these structures; these functions are described in the rest of this section.
To execute a query, you create its structure, then call the resulting structure's execute()
method, which returns a cursor-like object (more on that later). The same clause structure can be used repeatedly. A ClauseElement is compiled into a string representation by an underlying SQLEngine object, which is located by searching through the clause's child items for a Table object, which provides a reference to its SQLEngine.
The examples below all include a dump of the generated SQL corresponding to the query object, as well as a dump of the statement's bind parameters. In all cases, bind parameters are named parameters using the colon format (i.e. ':name'). A named parameter scheme, either ':name' or '%(name)s', is used with all databases, including those that use positional schemes. For those, the named-parameter statement and its bind values are converted to the proper list-based format right before execution. Therefore a SQLAlchemy application that uses ClauseElements can standardize on named parameters for all databases.
For this section, we will assume the following tables:
from sqlalchemy import * db = create_engine('sqlite://filename=mydb', echo=True) # a table to store users users = Table('users', db, Column('user_id', Integer, primary_key = True), Column('user_name', String(40)), Column('password', String(80)) ) # a table that stores mailing addresses associated with a specific user addresses = Table('addresses', db, Column('address_id', Integer, primary_key = True), Column('user_id', Integer, ForeignKey("users.user_id")), Column('street', String(100)), Column('city', String(80)), Column('state', String(2)), Column('zip', String(10)) ) # a table that stores keywords keywords = Table('keywords', db, Column('keyword_id', Integer, primary_key = True), Column('name', VARCHAR(50)) ) # a table that associates keywords with users userkeywords = Table('userkeywords', db, Column('user_id', INT, ForeignKey("users")), Column('keyword_id', INT, ForeignKey("keywords")) )
A select is done by constructing a Select
object with the proper arguments, adding any extra arguments if desired, then calling its execute()
method.
from sqlalchemy import * # use the select() function defined in the sql package s = select([users]) # or, call the select() method off of a Table object s = users.select() # then, call execute on the Select object: sqlc = s.execute()
# the SQL text of any clause object can also be viewed via the str() call: >>> str(s) SELECT users.user_id, users.user_name, users.password FROM users
The object returned by the execute call is a sqlalchemy.engine.ResultProxy
object, which acts much like a DBAPI cursor
object in the context of a result set, except that the rows returned can address their columns by ordinal position, column name, or even column object:
# select rows, get resulting ResultProxy object sqlc = users.select().execute()
# get one row row = c.fetchone() # get the 'user_id' column via integer index: user_id = row[0] # or column name user_name = row['user_name'] # or column object password = row[users.c.password] # or column accessor password = row.password # ResultProxy object also supports fetchall() rows = c.fetchall() # or get the underlying DBAPI cursor object cursor = c.cursor
A common need when writing statements that reference multiple tables is to create labels for columns, thereby separating columns from different tables with the same name. The Select construct supports automatic generation of column labels via the use_labels=True
parameter:
sqlc = select([users, addresses], users.c.user_id==addresses.c.address_id, use_labels=True).execute()
The table name part of the label is affected if you use a construct such as a table alias:
person = users.alias('person') sqlc = select([person, addresses], person.c.user_id==addresses.c.address_id, use_labels=True).execute()
You can also specify custom labels on a per-column basis using the label()
function:
sqlc = select([users.c.user_id.label('id'), users.c.user_name.label('name')]).execute()
Calling select
off a table automatically generates a column clause which includes all the table's columns, in the order they are specified in the source Table object.
But in addition to selecting all the columns off a single table, any set of columns can be specified, as well as full tables, and any combination of the two:
The WHERE condition is the named keyword argument whereclause
, or the second positional argument to the select()
constructor and the first positional argument to the select()
method of Table
.
WHERE conditions are constructed using column objects, literal values, and functions defined in the sqlalchemy.sql
module. Column objects override the standard Python operators to provide clause compositional objects, which compile down to SQL operations:
sqlc = users.select(users.c.user_id == 7).execute()
Notice that the literal value "7" was broken out of the query and placed into a bind parameter. Databases such as Oracle must parse incoming SQL and create a "plan" when new queries are received, which is an expensive process. By using bind parameters, the same query with various literal values can have its plan compiled only once, and used repeatedly with less overhead.
More where clauses:
# another comparison operator sqlc = select([users], users.c.user_id>7).execute()
# OR keyword sqlc = users.select(or_(users.c.user_name=='jack', users.c.user_name=='ed')).execute()
# AND keyword sqlc = users.select(and_(users.c.user_name=='jack', users.c.password=='dog')).execute()
# NOT keyword sqlc = users.select(not_( or_(users.c.user_name=='jack', users.c.password=='dog') )).execute()
# IN clause sqlc = users.select(users.c.user_name.in_('jack', 'ed', 'fred')).execute()
# join users and addresses together sqlc = select([users, addresses], users.c.user_id==addresses.c.address_id).execute()
# join users and addresses together, but dont specify "addresses" in the # selection criterion. The WHERE criterion adds it to the FROM list # automatically. sqlc = select([users], and_( users.c.user_id==addresses.c.user_id, users.c.user_name=='fred' )).execute()
Select statements can also generate a WHERE clause based on the parameters you give it. If a given parameter, which matches the name of a column or its "label" (the combined tablename + "_" + column name), and does not already correspond to a bind parameter in the select object, it will be added as a comparison against that column. This is a shortcut to creating a full WHERE clause:
Supported column operators so far are all the numerical comparison operators, i.e. '==', '>', '>=', etc., as well as like(), startswith(), endswith(), between(), and in(). Boolean operators include not_(), and() and or(), which also can be used inline via '~', '&', and '|'. Math operators are '+', '-', '*', '/'. Any custom operator can be specified via the op() function shown below.
# "like" operator users.select(users.c.user_name.like('%ter')) # equality operator users.select(users.c.user_name == 'jane') # in opertator users.select(users.c.user_id.in_(1,2,3)) # and_, endswith, equality operators users.select(and_(addresses.c.street.endswith('green street'), addresses.c.zip=='11234')) # & operator subsituting for 'and_' users.select(addresses.c.street.endswith('green street') & (addresses.c.zip=='11234')) # + concatenation operator select([users.c.user_name + '_name']) # NOT operator users.select(~(addresses.c.street == 'Green Street')) # any custom operator select([users.c.user_name.op('||')('_category')])
For queries that don't contain any tables, the SQLEngine can be specified to any constructed statement via the engine
keyword parameter:
# select a literal select(["hi"], engine=myengine) # select a function select([func.now()], engine=db)
Functions can be specified using the func
keyword:
sqlselect([func.count(users.c.user_id)]).execute()
sqlusers.select(func.substr(users.c.user_name, 1) == 'J').execute()
Functions also are callable as standalone values:
# call the "now()" function time = func.now(engine=myengine).scalar() # call myfunc(1,2,3) myvalue = func.myfunc(1, 2, 3, engine=db).execute() # or call them off the engine db.func.now().scalar()
You can drop in a literal value anywhere there isnt a column to attach to via the literal
keyword:
sqlselect([literal('foo') + literal('bar'), users.c.user_name]).execute()
# literals have all the same comparison functions as columns sqlselect([literal('foo') == literal('bar')], engine=myengine).scalar()
Literals also take an optional type
parameter to give literals a type. This can sometimes be significant, for example when using the "+" operator with SQLite, the String type is detected and the operator is converted to "||":
sqlselect([literal('foo', type=String) + 'bar'], engine=e).execute()
The ORDER BY clause of a select statement can be specified as individual columns to order by within an array specified via the order_by
parameter, and optional usage of the asc() and desc() functions:
These are specified as keyword arguments:
sqlc = select([users.c.user_name], distinct=True).execute()
sqlc = users.select(limit=10, offset=20).execute()
The Oracle driver does not support LIMIT and OFFSET directly, but instead wraps the generated query into a subquery and uses the "rownum" variable to control the rows selected (this is somewhat experimental).
As some of the examples indicated above, a regular inner join can be implicitly stated, just like in a SQL expression, by just specifying the tables to be joined as well as their join conditions:
sqladdresses.select(addresses.c.user_id==users.c.user_id).execute()
There is also an explicit join constructor, which can be embedded into a select query via the from_obj
parameter of the select statement:
sqladdresses.select(from_obj=[ addresses.join(users, addresses.c.user_id==users.c.user_id) ]).execute()
The join constructor can also be used by itself:
sqljoin(users, addresses, users.c.user_id==addresses.c.user_id).select().execute()
The join criterion in a join() call is optional. If not specified, the condition will be derived from the foreign key relationships of the two tables. If no criterion can be constructed, an exception will be raised.
sqljoin(users, addresses).select().execute()
Notice that this is the first example where the FROM criterion of the select statement is explicitly specified. In most cases, the FROM criterion is automatically determined from the columns requested as well as the WHERE clause. The from_obj
keyword argument indicates a list of explicit FROM clauses to be used in the statement.
A join can be created on its own using the join
or outerjoin
functions, or can be created off of an existing Table or other selectable unit via the join
or outerjoin
methods:
Aliases are used primarily when you want to use the same table more than once as a FROM expression in a statement:
address_b = addresses.alias('addressb') sql# select users who have an address on Green street as well as Orange street users.select(and_( users.c.user_id==addresses.c.user_id, addresses.c.street.like('%Green%'), users.c.user_id==address_b.c.user_id, address_b.c.street.like('%Orange%') )).execute()
SQLAlchemy allows the creation of select statements from not just Table objects, but from a whole class of objects that implement the Selectable
interface. This includes Tables, Aliases, Joins and Selects. Therefore, if you have a Select, you can select from the Select:
">>>" s = users.select() ">>>" str(s) SELECT users.user_id, users.user_name, users.password FROM users ">>>" s = s.select() ">>>" str(s) SELECT user_id, user_name, password FROM (SELECT users.user_id, users.user_name, users.password FROM users)
Any Select, Join, or Alias object supports the same column accessors as a Table:
">>>" s = users.select() ">>>" [c.key for c in s.columns] ['user_id', 'user_name', 'password']
When you use use_labels=True
in a Select object, the label version of the column names become the keys of the accessible columns. In effect you can create your own "view objects":
s = select([users, addresses], users.c.user_id==addresses.c.user_id, use_labels=True) sqlselect([ s.c.users_user_name, s.c.addresses_street, s.c.addresses_zip ], s.c.addresses_city=='San Francisco').execute()
To specify a SELECT statement as one of the selectable units in a FROM clause, it usually should be given an alias.
sqls = users.select().alias('u') select([addresses, s]).execute()
Select objects can be used in a WHERE condition, in operators such as IN:
# select user ids for all users whos name starts with a "p" s = select([users.c.user_id], users.c.user_name.like('p%')) # now select all addresses for those users sqladdresses.select(addresses.c.user_id.in_(s)).execute()
The sql package supports embedding select statements into other select statements as the criterion in a WHERE condition, or as one of the "selectable" objects in the FROM list of the query. It does not at the moment directly support embedding a SELECT statement as one of the column criterion for a statement, although this can be achieved via direct text insertion, described later.
Subqueries can be used in the column clause of a select statement by specifying the scalar=True
flag:
sqlselect([table2.c.col1, table2.c.col2, select([table1.c.col1], table1.c.col2==7, scalar=True)])
When a select object is embedded inside of another select object, and both objects reference the same table, SQLAlchemy makes the assumption that the table should be correlated from the child query to the parent query. To disable this behavior, specify the flag correlate=False
to the Select statement.
# make an alias of a regular select. s = select([addresses.c.street], addresses.c.user_id==users.c.user_id).alias('s') >>> str(s) SELECT addresses.street FROM addresses, users WHERE addresses.user_id = users.user_id # now embed that select into another one. the "users" table is removed from # the embedded query's FROM list and is instead correlated to the parent query s2 = select([users, s.c.street]) >>> str(s2) SELECT users.user_id, users.user_name, users.password, s.street FROM users, (SELECT addresses.street FROM addresses WHERE addresses.user_id = users.user_id) s
An EXISTS clause can function as a higher-scaling version of an IN clause, and is usually used in a correlated fashion:
# find all users who have an address on Green street: sqlusers.select( exists( [addresses.c.address_id], and_( addresses.c.user_id==users.c.user_id, addresses.c.street.like('%Green%') ) ) )
Unions come in two flavors, UNION and UNION ALL, which are available via module level functions or methods off a Selectable:
sqlunion( addresses.select(addresses.c.street=='123 Green Street'), addresses.select(addresses.c.street=='44 Park Ave.'), addresses.select(addresses.c.street=='3 Mill Road'), order_by=[addresses.c.street] ).execute()
sqlusers.select( users.c.user_id==7 ).union_all( users.select( users.c.user_id==9 ), order_by=[users.c.user_id] # order_by is an argument to union_all() ).execute()
Throughout all these examples, SQLAlchemy is busy creating bind parameters wherever literal expressions occur. You can also specify your own bind parameters with your own names, and use the same statement repeatedly. As mentioned at the top of this section, named bind parameters are always used regardless of the type of DBAPI being used; for DBAPI's that expect positional arguments, bind parameters are converted to lists right before execution, and Pyformat strings in statements, i.e. '%(name)s', are converted to the appropriate positional style.
s = users.select(users.c.user_name==bindparam('username')) sqls.execute(username='fred')
sqls.execute(username='jane')
sqls.execute(username='mary')
executemany()
is also available, but that applies more to INSERT/UPDATE/DELETE, described later.
The generation of bind parameters is performed specific to the engine being used. The examples in this document all show "named" parameters like those used in sqlite and oracle. Depending on the parameter type specified by the DBAPI module, the correct bind parameter scheme will be used.
By throwing the compile()
method onto the end of any query object, the query can be "compiled" by the SQLEngine into a sqlalchemy.sql.Compiled
object just once, and the resulting compiled object reused, which eliminates repeated internal compilation of the SQL string:
s = users.select(users.c.user_name==bindparam('username')).compile() s.execute(username='fred') s.execute(username='jane') s.execute(username='mary')
The sql package tries to allow free textual placement in as many ways as possible. In the examples below, note that the from_obj parameter is used only when no other information exists within the select object with which to determine table metadata. Also note that in a query where there isnt even table metadata used, the SQLEngine to be used for the query has to be explicitly specified:
# strings as column clauses sqlselect(["user_id", "user_name"], from_obj=[users]).execute()
# strings for full column lists sqlselect( ["user_id, user_name, password, addresses.*"], from_obj=[users.alias('u'), addresses]).execute()
# functions, etc. sqlselect([users.c.user_id, "process_string(user_name)"]).execute()
# where clauses sqlusers.select(and_(users.c.user_id==7, "process_string(user_name)=27")).execute()
# subqueries sqlusers.select( "exists (select 1 from addresses where addresses.user_id=users.user_id)").execute()
# custom FROM objects sqlselect( ["*"], from_obj=["(select user_id, user_name from users)"], engine=db).execute()
# a full query {sql}text("select user_name from users", engine=db).execute() select user_name from users {} # or call text() off of the engine engine.text("select user_name from users").execute() # execute off the engine directly - you must use the engine's native bind parameter # style (i.e. named, pyformat, positional, etc.) {sql}db.execute( "select user_name from users where user_id=:user_id", {'user_id':7}).execute() select user_name from users where user_id=:user_id {'user_id':7}
Use the format ':paramname'
to define bind parameters inside of a text block. They will be converted to the appropriate format upon compilation:
t = engine.text("select foo from mytable where lala=:hoho") r = t.execute(hoho=7)
Bind parameters can also be explicit, which allows typing information to be added. Just specify them as a list with keys that match those inside the textual statement:
t = engine.text("select foo from mytable where lala=:hoho", bindparams=[bindparam('hoho', type=types.String)]) r = t.execute(hoho="im hoho")
Result-row type processing can be added via the typemap
argument, which is a dictionary of return columns mapped to types:
# specify DateTime type for the 'foo' column in the result set # sqlite, for example, uses result-row post-processing to construct dates t = engine.text("select foo from mytable where lala=:hoho", bindparams=[bindparam('hoho', type=types.String)], typemap={'foo':types.DateTime} ) r = t.execute(hoho="im hoho") # 'foo' is a datetime year = r.fetchone()['foo'].year
One of the primary motivations for a programmatic SQL library is to allow the piecemeal construction of a SQL statement based on program variables. All the above examples typically show Select objects being created all at once. The Select object also includes "builder" methods to allow building up an object. The below example is a "user search" function, where users can be selected based on primary key, user name, street address, keywords, or any combination:
def find_users(id=None, name=None, street=None, keywords=None): statement = users.select() if id is not None: statement.append_whereclause(users.c.user_id==id) if name is not None: statement.append_whereclause(users.c.user_name==name) if street is not None: # append_whereclause joins "WHERE" conditions together with AND statement.append_whereclause(users.c.user_id==addresses.c.user_id) statement.append_whereclause(addresses.c.street==street) if keywords is not None: statement.append_from( users.join(userkeywords, users.c.user_id==userkeywords.c.user_id).join( keywords, userkeywords.c.keyword_id==keywords.c.keyword_id)) statement.append_whereclause(keywords.c.name.in_(keywords)) # to avoid multiple repeats, set query to be DISTINCT: statement.distinct=True return statement.execute() sqlfind_users(id=7)
sqlfind_users(street='123 Green Street')
sqlfind_users(name='Jack', keywords=['jack','foo'])
An INSERT involves just one table. The Insert object is used via the insert() function, and the specified columns determine what columns show up in the generated SQL. If primary key columns are left out of the criterion, the SQL generator will try to populate them as specified by the particular database engine and sequences, i.e. relying upon an auto-incremented column or explicitly calling a sequence beforehand. Insert statements, as well as updates and deletes, can also execute multiple parameters in one pass via specifying an array of dictionaries as parameters.
The values to be populated for an INSERT or an UPDATE can be specified to the insert()/update() functions as the values
named argument, or the query will be compiled based on the values of the parameters sent to the execute() method.
# basic insert sqlusers.insert().execute(user_id=1, user_name='jack', password='asdfdaf')
# insert just user_name, NULL for others # will auto-populate primary key columns if they are configured # to do so sqlusers.insert().execute(user_name='ed')
# INSERT with a list: sqlusers.insert(values=(3, 'jane', 'sdfadfas')).execute()
# INSERT with user-defined bind parameters i = users.insert( values={'user_name':bindparam('name'), 'password':bindparam('pw')} ) sqli.execute(name='mary', pw='adas5fs')
# INSERT many - if no explicit 'values' parameter is sent, # the first parameter list in the list determines # the generated SQL of the insert (i.e. what columns are present) # executemany() is used at the DBAPI level sqlusers.insert().execute( {'user_id':7, 'user_name':'jack', 'password':'asdfasdf'}, {'user_id':8, 'user_name':'ed', 'password':'asdffcadf'}, {'user_id':9, 'user_name':'fred', 'password':'asttf'}, )
Updates work a lot like INSERTS, except there is an additional WHERE clause that can be specified.
# change 'jack' to 'ed' sqlusers.update(users.c.user_name=='jack').execute(user_name='ed')
# use bind parameters u = users.update(users.c.user_name==bindparam('name'), values={'user_name':bindparam('newname')}) sqlu.execute(name='jack', newname='ed')
# update a column to another column sqlusers.update(values={users.c.password:users.c.user_name}).execute()
# multi-update sqlusers.update(users.c.user_id==bindparam('id')).execute( {'id':7, 'user_name':'jack', 'password':'fh5jks'}, {'id':8, 'user_name':'ed', 'password':'fsr234ks'}, {'id':9, 'user_name':'mary', 'password':'7h5jse'}, )
A correlated update lets you update a table using selection from another table, or the same table:
s = select([addresses.c.city], addresses.c.user_id==users.c.user_id) sqlusers.update( and_(users.c.user_id>10, users.c.user_id<20), values={users.c.user_name:s} ).execute()
A delete is formulated like an update, except theres no values:
users.delete(users.c.user_id==7).execute() users.delete(users.c.user_name.like(bindparam('name'))).execute( {'name':'%Jack%'}, {'name':'%Ed%'}, {'name':'%Jane%'}, ) users.delete(exists())
Data mapping describes the process of defining Mapper objects, which associate table metadata with user-defined classes.
The Mapper's role is to perform SQL operations upon the database, associating individual table rows with instances of those classes, and individual database columns with properties upon those instances, to transparently associate in-memory objects with a persistent database representation.
When a Mapper is created to associate a Table object with a class, all of the columns defined in the Table object are associated with the class via property accessors, which add overriding functionality to the normal process of setting and getting object attributes. These property accessors keep track of changes to object attributes; these changes will be stored to the database when the application "commits" the current transactional context (known as a Unit of Work). The __init__()
method of the object is also decorated to communicate changes when new instances of the object are created.
The Mapper also provides the interface by which instances of the object are loaded from the database. The primary method for this is its select()
method, which has similar arguments to a sqlalchemy.sql.Select
object. But this select method executes automatically and returns results, instead of awaiting an execute() call. Instead of returning a cursor-like object, it returns an array of objects.
The three elements to be defined, i.e. the Table metadata, the user-defined class, and the Mapper, are typically defined as module-level variables, and may be defined in any fashion suitable to the application, with the only requirement being that the class and table metadata are described before the mapper. For the sake of example, we will be defining these elements close together, but this should not be construed as a requirement; since SQLAlchemy is not a framework, those decisions are left to the developer or an external framework.
This is the simplest form of a full "round trip" of creating table meta data, creating a class, mapping the class to the table, getting some results, and saving changes. For each concept, the following sections will dig in deeper to the available capabilities.
from sqlalchemy import * # engine engine = create_engine("sqlite://mydb.db") # table metadata users = Table('users', engine, Column('user_id', Integer, primary_key=True), Column('user_name', String(16)), Column('password', String(20)) ) # class definition class User(object): pass # create a mapper usermapper = mapper(User, users) # select sqluser = usermapper.select_by(user_name='fred')[0]
# modify user.user_name = 'fred jones' # commit - saves everything that changed sqlobjectstore.commit()
For convenience's sake, the Mapper can be attached as an attribute on the class itself as well:
User.mapper = mapper(User, users) userlist = User.mapper.select_by(user_id=12)
There is also a full-blown "monkeypatch" function that creates a primary mapper, attaches the above mapper class property, and also the methods get, get_by, select, select_by, selectone, selectfirst, commit, expire, refresh, expunge
and delete
:
# "assign" a mapper to the User class/users table assign_mapper(User, users) # methods are attached to the class for selecting userlist = User.select_by(user_id=12) myuser = User.get(1) # mark an object as deleted for the next commit myuser.delete() # commit the changes on a specific object myotheruser.commit()
Other methods of associating mappers and finder methods with their corresponding classes, such as via common base classes or mixins, can be devised as well. SQLAlchemy does not aim to dictate application architecture and will always allow the broadest variety of architectural patterns, but may include more helper objects and suggested architectures in the future.
A common request is the ability to create custom class properties that override the behavior of setting/getting an attribute. Currently, the easiest way to do this in SQLAlchemy is how it would be done in any Python program; define your attribute with a different name, such as "_attribute", and use a property to get/set its value. The mapper just needs to be told of the special name:
class MyClass(object): def _set_email(self, email): self._email = email def _get_email(self, email): return self._email email = property(_get_email, _set_email) m = mapper(MyClass, mytable, properties = { # map the '_email' attribute to the "email" column # on the table '_email': mytable.c.email })
In a later release, SQLAlchemy will also allow getemail and setemail to be attached directly to the "email" property created by the mapper, and will also allow this association to occur via decorators.
There are a variety of ways to select from a mapper. These range from minimalist to explicit. Below is a synopsis of the these methods:
# select_by, using property names or column names as keys # the keys are grouped together by an AND operator result = mapper.select_by(name='john', street='123 green street') # select_by can also combine SQL criterion with key/value properties result = mapper.select_by(users.c.user_name=='john', addresses.c.zip_code=='12345, street='123 green street') # get_by, which takes the same arguments as select_by # returns a single scalar result or None if no results user = mapper.get_by(id=12) # "dynamic" versions of select_by and get_by - everything past the # "select_by_" or "get_by_" is used as the key, and the function argument # as the value result = mapper.select_by_name('fred') u = mapper.get_by_name('fred') # get an object directly from its primary key. this will bypass the SQL # call if the object has already been loaded u = mapper.get(15) # get an object that has a composite primary key of three columns. # the order of the arguments matches that of the table meta data. myobj = mapper.get(27, 3, 'receipts') # using a WHERE criterion result = mapper.select(or_(users.c.user_name == 'john', users.c.user_name=='fred')) # using a WHERE criterion to get a scalar u = mapper.selectfirst(users.c.user_name=='john') # selectone() is a stricter version of selectfirst() which # will raise an exception if there is not exactly one row u = mapper.selectone(users.c.user_name=='john') # using a full select object result = mapper.select(users.select(users.c.user_name=='john')) # using straight text result = mapper.select_text("select * from users where user_name='fred'") # or using a "text" object result = mapper.select(text("select * from users where user_name='fred'", engine=engine))
Some of the above examples above illustrate the usage of the mapper's Table object to provide the columns for a WHERE Clause. These columns are also accessible off of the mapped class directly. When a mapper is assigned to a class, it also attaches a special property accessor c
to the class itself, which can be used just like the table metadata to access the columns of the table:
User.mapper = mapper(User, users) userlist = User.mapper.select(User.c.user_id==12)
When objects corresponding to mapped classes are created or manipulated, all changes are logged by a package called sqlalchemy.mapping.objectstore
. The changes are then written to the database when an application calls objectstore.commit()
. This pattern is known as a Unit of Work, and has many advantages over saving individual objects or attributes on those objects with individual method invocations. Domain models can be built with far greater complexity with no concern over the order of saves and deletes, excessive database round-trips and write operations, or deadlocking issues. The commit() operation uses a transaction as well, and will also perform "concurrency checking" to insure the proper number of rows were in fact affected (not supported with the current MySQL drivers). Transactional resources are used effectively in all cases; the unit of work handles all the details.
The Unit of Work is a powerful tool, and has some important concepts that must be understood in order to use it effectively. While this section illustrates rudimentary Unit of Work usage, it is strongly encouraged to consult the None section for a full description on all its operations, including session control, deletion, and developmental guidelines.
When a mapper is created, the target class has its mapped properties decorated by specialized property accessors that track changes, and its __init__()
method is also decorated to mark new objects as "new".
User.mapper = mapper(User, users) # create a new User myuser = User() myuser.user_name = 'jane' myuser.password = 'hello123' # create another new User myuser2 = User() myuser2.user_name = 'ed' myuser2.password = 'lalalala' # load a third User from the database sqlmyuser3 = User.mapper.select(User.c.user_name=='fred')[0]
myuser3.user_name = 'fredjones' # save all changes sqlobjectstore.commit()
In the examples above, we defined a User class with basically no properties or methods. Theres no particular reason it has to be this way, the class can explicitly set up whatever properties it wants, whether or not they will be managed by the mapper. It can also specify a constructor, with the restriction that the constructor is able to function with no arguments being passed to it (this restriction can be lifted with some extra parameters to the mapper; more on that later):
class User(object): def __init__(self, user_name = None, password = None): self.user_id = None self.user_name = user_name self.password = password def get_name(self): return self.user_name def __repr__(self): return "User id %s name %s password %s" % (repr(self.user_id), repr(self.user_name), repr(self.password)) User.mapper = mapper(User, users) u = User('john', 'foo') sqlobjectstore.commit()
>>> u User id 1 name 'john' password 'foo'
Recent versions of SQLAlchemy will only put modified object attributes columns into the UPDATE statements generated upon commit. This is to conserve database traffic and also to successfully interact with a "deferred" attribute, which is a mapped object attribute against the mapper's primary table that isnt loaded until referenced by the application.
So that covers how to map the columns in a table to an object, how to load objects, create new ones, and save changes. The next step is how to define an object's relationships to other database-persisted objects. This is done via the relation
function provided by the mapper module. So with our User class, lets also define the User has having one or more mailing addresses. First, the table metadata:
from sqlalchemy import * engine = create_engine('sqlite://filename=mydb') # define user table users = Table('users', engine, Column('user_id', Integer, primary_key=True), Column('user_name', String(16)), Column('password', String(20)) ) # define user address table addresses = Table('addresses', engine, Column('address_id', Integer, primary_key=True), Column('user_id', Integer, ForeignKey("users.user_id")), Column('street', String(100)), Column('city', String(80)), Column('state', String(2)), Column('zip', String(10)) )
Of importance here is the addresses table's definition of a foreign key relationship to the users table, relating the user_id column into a parent-child relationship. When a Mapper wants to indicate a relation of one object to another, this ForeignKey object is the default method by which the relationship is determined (although if you didn't define ForeignKeys, or you want to specify explicit relationship columns, that is available as well).
So then lets define two classes, the familiar User class, as well as an Address class:
class User(object): def __init__(self, user_name = None, password = None): self.user_name = user_name self.password = password class Address(object): def __init__(self, street=None, city=None, state=None, zip=None): self.street = street self.city = city self.state = state self.zip = zip
And then a Mapper that will define a relationship of the User and the Address classes to each other as well as their table metadata. We will add an additional mapper keyword argument properties
which is a dictionary relating the name of an object property to a database relationship, in this case a relation
object against a newly defined mapper for the Address class:
User.mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses)) } )
Lets do some operations with these classes and see what happens:
u = User('jane', 'hihilala') u.addresses.append(Address('123 anywhere street', 'big city', 'UT', '76543')) u.addresses.append(Address('1 Park Place', 'some other city', 'OK', '83923')) objectstore.commit()
A lot just happened there! The Mapper object figured out how to relate rows in the addresses table to the users table, and also upon commit had to determine the proper order in which to insert rows. After the insert, all the User and Address objects have all their new primary and foreign keys populated.
Also notice that when we created a Mapper on the User class which defined an 'addresses' relation, the newly created User instance magically had an "addresses" attribute which behaved like a list. This list is in reality a property accessor function, which returns an instance of sqlalchemy.util.HistoryArraySet
, which fulfills the full set of Python list accessors, but maintains a unique set of objects (based on their in-memory identity), and also tracks additions and deletions to the list:
del u.addresses[1] u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839')) objectstore.commit()
So our one address that was removed from the list, was updated to have a user_id of None
, and a new address object was inserted to correspond to the new Address added to the User. But now, theres a mailing address with no user_id floating around in the database of no use to anyone. How can we avoid this ? This is acheived by using the private=True
parameter of relation
:
User.mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses), private=True) } ) del u.addresses[1] u.addresses.append(Address('27 New Place', 'Houston', 'TX', '34839')) objectstore.commit()
In this case, with the private flag set, the element that was removed from the addresses list was also removed from the database. By specifying the private
flag on a relation, it is indicated to the Mapper that these related objects exist only as children of the parent object, otherwise should be deleted.
By creating relations with the backref
keyword, a bi-directional relationship can be created which will keep both ends of the relationship updated automatically, even without any database queries being executed. Below, the User mapper is created with an "addresses" property, and the corresponding Address mapper receives a "backreference" to the User object via the property name "user":
Address.mapper = mapper(Address, addresses) User.mapper = mapper(User, users, properties = { 'addresses' : relation(Address.mapper, backref='user') } ) u = User('fred', 'hi') a1 = Address('123 anywhere street', 'big city', 'UT', '76543') a2 = Address('1 Park Place', 'some other city', 'OK', '83923') # append a1 to u u.addresses.append(a1) # attach u to a2 a2.user = u # the bi-directional relation is maintained >>> u.addresses == [a1, a2] True >>> a1.user is user and a2.user is user True
The backreference feature also works with many-to-many relationships, which are described later. When creating a backreference, a corresponding property is placed on the child mapper. The default arguments to this property can be overridden using the backref()
function:
Address.mapper = mapper(Address, addresses) User.mapper = mapper(User, users, properties = { 'addresses' : relation(Address.mapper, backref=backref('user', lazy=False, private=True)) } )
The mapper package has a helper function cascade_mappers()
which can simplify the task of linking several mappers together. Given a list of classes and/or mappers, it identifies the foreign key relationships between the given mappers or corresponding class mappers, and creates relation() objects representing those relationships, including a backreference. Attempts to find the "secondary" table in a many-to-many relationship as well. The names of the relations are a lowercase version of the related class. In the case of one-to-many or many-to-many, the name is "pluralized", which currently is based on the English language (i.e. an 's' or 'es' added to it):
# create two mappers. the 'users' and 'addresses' tables have a foreign key # relationship mapper1 = mapper(User, users) mapper2 = mapper(Address, addresses) # cascade the two mappers together (can also specify User, Address as the arguments) cascade_mappers(mapper1, mapper2) # two new object instances u = User('user1') a = Address('test') # "addresses" and "user" property are automatically added u.addresses.append(a) print a.user
We've seen how the relation
specifier affects the saving of an object and its child items, how does it affect selecting them? By default, the relation keyword indicates that the related property should be attached a Lazy Loader when instances of the parent object are loaded from the database; this is just a callable function that when accessed will invoke a second SQL query to load the child objects of the parent.
# define a mapper User.mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses), private=True) }) # select users where username is 'jane', get the first element of the list # this will incur a load operation for the parent table sqluser = User.mapper.select(user_name='jane')[0]
# iterate through the User object's addresses. this will incur an # immediate load of those child items sqlfor a in user.addresses:
In mappers that have relationships, the select_by
method and its cousins include special functionality that can be used to create joins. Just specify a key in the argument list which is not present in the primary mapper's list of properties or columns, but is present in the property list of one of its relationships:
sqll = User.mapper.select_by(street='123 Green Street')
The above example is shorthand for:
l = User.mapper.select(and_( Address.c.user_id==User.c.user_id, Address.c.street=='123 Green Street') )
Once the child list of Address objects is loaded, it is done loading for the lifetime of the object instance. Changes to the list will not be interfered with by subsequent loads, and upon commit those changes will be saved. Similarly, if a new User object is created and child Address objects added, a subsequent select operation which happens to touch upon that User instance, will also not affect the child list, since it is already loaded.
The issue of when the mapper actually gets brand new objects from the database versus when it assumes the in-memory version is fine the way it is, is a subject of transactional scope. Described in more detail in the Unit of Work section, for now it should be noted that the total storage of all newly created and selected objects, within the scope of the current thread, can be reset via releasing or otherwise disregarding all current object instances, and calling:
objectstore.clear()
This operation will clear out all currently mapped object instances, and subsequent select statements will load fresh copies from the databse.
To operate upon a single object, just use the remove
function:
# (this function coming soon) objectstore.remove(myobject)
With just a single parameter "lazy=False" specified to the relation object, the parent and child SQL queries can be joined together.
Address.mapper = mapper(Address, addresses) User.mapper = mapper(User, users, properties = { 'addresses' : relation(Address.mapper, lazy=False) } ) sqluser = User.mapper.get_by(user_name='jane')
for a in user.addresses: print repr(a)
Above, a pretty ambitious query is generated just by specifying that the User should be loaded with its child Addresses in one query. When the mapper processes the results, it uses an Identity Map to keep track of objects that were already loaded, based on their primary key identity. Through this method, the redundant rows produced by the join are organized into the distinct object instances they represent.
The generation of this query is also immune to the effects of additional joins being specified in the original query. To use our select_by example above, joining against the "addresses" table to locate users with a certain street results in this behavior:
sqlusers = User.mapper.select_by(street='123 Green Street')
The join implied by passing the "street" parameter is converted into an "aliasized" clause by the eager loader, so that it does not conflict with the join used to eager load the child address objects.
The options
method of mapper provides an easy way to get alternate forms of a mapper from an original one. The most common use of this feature is to change the "eager/lazy" loading behavior of a particular mapper, via the functions eagerload()
, lazyload()
and noload()
:
# user mapper with lazy addresses User.mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses)) } ) # make an eager loader eagermapper = User.mapper.options(eagerload('addresses')) u = eagermapper.select() # make another mapper that wont load the addresses at all plainmapper = User.mapper.options(noload('addresses')) # multiple options can be specified mymapper = oldmapper.options(lazyload('tracker'), noload('streets'), eagerload('members')) # to specify a relation on a relation, separate the property names by a "." mymapper = oldmapper.options(eagerload('orders.items'))
The above examples focused on the "one-to-many" relationship. To do other forms of relationship is easy, as the relation
function can usually figure out what you want:
# a table to store a user's preferences for a site prefs = Table('user_prefs', engine, Column('pref_id', Integer, primary_key = True), Column('stylename', String(20)), Column('save_password', Boolean, nullable = False), Column('timezone', CHAR(3), nullable = False) ) # user table gets 'preference_id' column added users = Table('users', engine, Column('user_id', Integer, primary_key = True), Column('user_name', String(16), nullable = False), Column('password', String(20), nullable = False), Column('preference_id', Integer, ForeignKey("prefs.pref_id")) ) # class definition for preferences class UserPrefs(object): pass UserPrefs.mapper = mapper(UserPrefs, prefs) # address mapper Address.mapper = mapper(Address, addresses) # make a new mapper referencing everything. m = mapper(User, users, properties = dict( addresses = relation(Address.mapper, lazy=True, private=True), preferences = relation(UserPrefs.mapper, lazy=False, private=True), )) # select sqluser = m.get_by(user_name='fred')
save_password = user.preferences.save_password # modify user.preferences.stylename = 'bluesteel' sqluser.addresses.append(Address('freddy@hi.org'))
# commit sqlobjectstore.commit()
The relation
function handles a basic many-to-many relationship when you specify the association table:
articles = Table('articles', engine, Column('article_id', Integer, primary_key = True), Column('headline', String(150), key='headline'), Column('body', TEXT, key='body'), ) keywords = Table('keywords', engine, Column('keyword_id', Integer, primary_key = True), Column('keyword_name', String(50)) ) itemkeywords = Table('article_keywords', engine, Column('article_id', Integer, ForeignKey("articles.article_id")), Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")) ) # class definitions class Keyword(object): def __init__(self, name = None): self.keyword_name = name class Article(object): pass # define a mapper that does many-to-many on the 'itemkeywords' association # table Article.mapper = mapper(Article, articles, properties = dict( keywords = relation(mapper(Keyword, keywords), itemkeywords, lazy=False) ) ) article = Article() article.headline = 'a headline' article.body = 'this is the body' article.keywords.append(Keyword('politics')) article.keywords.append(Keyword('entertainment')) sqlobjectstore.commit()
# select articles based on a keyword. select_by will handle the extra joins. sqlarticles = Article.mapper.select_by(keyword_name='politics')
# modify a = articles[0] del a.keywords[:] a.keywords.append(Keyword('topstories')) a.keywords.append(Keyword('government')) # commit. individual INSERT/DELETE operations will take place only for the list # elements that changed. sqlobjectstore.commit()
Many to Many can also be done with an association object, that adds additional information about how two items are related. This association object is set up in basically the same way as any other mapped object. However, since an association table typically has no primary key columns, you have to tell the mapper what columns will compose its "primary key", which are the two (or more) columns involved in the association. Also, the relation function needs an additional hint as to the fact that this mapped object is an association object, via the "association" argument which points to the class or mapper representing the other side of the association.
# add "attached_by" column which will reference the user who attached this keyword itemkeywords = Table('article_keywords', engine, Column('article_id', Integer, ForeignKey("articles.article_id")), Column('keyword_id', Integer, ForeignKey("keywords.keyword_id")), Column('attached_by', Integer, ForeignKey("users.user_id")) ) # define an association class class KeywordAssociation(object): pass # mapper for KeywordAssociation # specify "primary key" columns manually KeywordAssociation.mapper = mapper(KeywordAssociation, itemkeywords, primary_key = [itemkeywords.c.article_id, itemkeywords.c.keyword_id], properties={ 'keyword' : relation(Keyword, lazy = False), # uses primary Keyword mapper 'user' : relation(User, lazy = True) # uses primary User mapper } ) # mappers for Users, Keywords User.mapper = mapper(User, users) Keyword.mapper = mapper(Keyword, keywords) # define the mapper. m = mapper(Article, articles, properties={ 'keywords':relation(KeywordAssociation.mapper, lazy=False, association=Keyword) } ) # bonus step - well, we do want to load the users in one shot, # so modify the mapper via an option. # this returns a new mapper with the option switched on. m2 = mapper.options(eagerload('keywords.user')) # select by keyword again sqlalist = m2.select_by(keyword_name='jacks_stories')
# user is available for a in alist: for k in a.keywords: if k.keyword.name == 'jacks_stories': print k.user.user_name
The concept behind Unit of Work is to track modifications to a field of objects, and then be able to commit those changes to the database in a single operation. Theres a lot of advantages to this, including that your application doesn't need to worry about individual save operations on objects, nor about the required order for those operations, nor about excessive repeated calls to save operations that would be more efficiently aggregated into one step. It also simplifies database transactions, providing a neat package with which to insert into the traditional database begin/commit phase.
SQLAlchemy's unit of work includes these functions:
The current unit of work is accessed via a Session object. The Session is available in a thread-local context from the objectstore module as follows:
# get the current thread's session session = objectstore.get_session()
The Session object acts as a proxy to an underlying UnitOfWork object. Common methods include commit(), begin(), clear(), and delete(). Most of these methods are available at the module level in the objectstore module, which operate upon the Session returned by the get_session() function:
# this... objectstore.get_session().commit() # is the same as this: objectstore.commit()
A description of the most important methods and concepts follows.
The first concept to understand about the Unit of Work is that it is keeping track of all mapped objects which have been loaded from the database, as well as all mapped objects which have been saved to the database in the current session. This means that everytime you issue a select
call to a mapper which returns results, all of those objects are now installed within the current Session, mapped to their identity.
In particular, it is insuring that only one instance of a particular object, corresponding to a particular database identity, exists within the Session at one time. By "database identity" we mean a table or relational concept in the database combined with a particular primary key in that table. The session accomplishes this task using a dictionary known as an Identity Map. When select
or get
calls on mappers issue queries to the database, they will in nearly all cases go out to the database on each call to fetch results. However, when the mapper instantiates objects corresponding to the result set rows it receives, it will check the current identity map first before instantating a new object, and return the same instance already present in the identiy map if it already exists.
Example:
mymapper = mapper(MyClass, mytable) obj1 = mymapper.selectfirst(mytable.c.id==15) obj2 = mymapper.selectfirst(mytable.c.id==15) >>> obj1 is obj2 True
The Identity Map is an instance of weakref.WeakValueDictionary
, so that when an in-memory object falls out of scope, it will be removed automatically. However, this may not be instant if there are circular references upon the object. The current SA attributes implementation places some circular refs upon objects, although this may change in the future. There are other ways to remove object instances from the current session, as well as to clear the current session entirely, which are described later in this section.
To view the Session's identity map, it is accessible via the identity_map
accessor, and is an instance of weakref.WeakValueDictionary
:
>>> objectstore.get_session().identity_map.values() [<__main__.User object at 0x712630>, <__main__.Address object at 0x712a70>]
The identity of each object instance is available via the instancekey property attached to each object instance, and is a tuple consisting of the object's class and an additional tuple of primary key values, in the order that they appear within the table definition:
>>> obj._instance_key (<class 'test.tables.User'>, (7,))
At the moment that an object is assigned this key, it is also added to the current thread's unit-of-work's identity map.
The get() method on a mapper, which retrieves an object based on primary key identity, also checks in the current identity map first to save a database round-trip if possible. In the case of an object lazy-loading a single child object, the get() method is used as well, so scalar-based lazy loads may in some cases not query the database; this is particularly important for backreference relationships as it can save a lot of queries.
Methods on mappers and the objectstore module, which are relevant to identity include the following:
# assume 'm' is a mapper m = mapper(User, users) # get the identity key corresponding to a primary key key = m.identity_key(7) # for composite key, list out the values in the order they # appear in the table key = m.identity_key(12, 'rev2') # get the identity key given a primary key # value as a tuple and a class key = objectstore.get_id_key((12, 'rev2'), User) # get the identity key for an object, whether or not it actually # has one attached to it (m is the mapper for obj's class) key = m.instance_key(obj) # is this key in the current identity map? session.has_key(key) # is this object in the current identity map? session.has_instance(obj) # get this object from the current identity map based on # singular/composite primary key, or if not go # and load from the database obj = m.get(12, 'rev2')
The next concept is that in addition to the Session storing a record of all objects loaded or saved, it also stores records of all newly created objects, records of all objects whose attributes have been modified, records of all objects that have been marked as deleted, and records of all modified list-based attributes where additions or deletions have occurred. These lists are used when a commit()
call is issued to save all changes. After the commit occurs, these lists are all cleared out.
These records are all tracked by a collection of Set
objects (which are a SQLAlchemy-specific instance called a HashSet
) that are also viewable off the Session:
# new objects that were just constructed session.new # objects that exist in the database, that were modified session.dirty # objects that have been marked as deleted via session.delete(obj) session.deleted # list-based attributes thave been appended session.modified_lists
Heres an interactive example, assuming the User
and Address
mapper setup first outlined in None:
>>> # get the current thread's session >>> session = objectstore.get_session() >>> # create a new object, with a list-based attribute >>> # containing two more new objects >>> u = User(user_name='Fred') >>> u.addresses.append(Address(city='New York')) >>> u.addresses.append(Address(city='Boston')) >>> # objects are in the "new" list >>> session.new [<__main__.User object at 0x713630>, <__main__.Address object at 0x713a70>, <__main__.Address object at 0x713b30>] >>> # view the "modified lists" member, >>> # reveals our two Address objects as well, inside of a list >>> session.modified_lists [[<__main__.Address object at 0x713a70>, <__main__.Address object at 0x713b30>]] >>> # lets view what the class/ID is for the list object >>> ["%s %s" % (l.__class__, id(l)) for l in session.modified_lists] ['sqlalchemy.mapping.unitofwork.UOWListElement 7391872'] >>> # now commit >>> session.commit() >>> # the "new" list is now empty >>> session.new [] >>> # the "modified lists" list is now empty >>> session.modified_lists [] >>> # now lets modify an object >>> u.user_name='Ed' >>> # it gets placed in the "dirty" list >>> session.dirty [<__main__.User object at 0x713630>] >>> # delete one of the addresses >>> session.delete(u.addresses[0]) >>> # and also delete it off the User object, note that >>> # this is *not automatic* when using session.delete() >>> del u.addresses[0] >>> session.deleted [<__main__.Address object at 0x713a70>] >>> # commit >>> session.commit() >>> # all lists are cleared out >>> session.new, session.dirty, session.modified_lists, session.deleted ([], [], [], []) >>> # identity map has the User and the one remaining Address >>> session.identity_map.values() [<__main__.Address object at 0x713b30>, <__main__.User object at 0x713630>]
Unlike the identity map, the new
, dirty
, modified_lists
, and deleted
lists are not weak referencing. This means if you abandon all references to new or modified objects within a session, they are still present and will be saved on the next commit operation, unless they are removed from the Session explicitly (more on that later). The new
list may change in a future release to be weak-referencing, however for the deleted
list, one can see that its quite natural for a an object marked as deleted to have no references in the application, yet a DELETE operation is still required.
This is the main gateway to what the Unit of Work does best, which is save everything ! It should be clear by now that a commit looks like:
objectstore.get_session().commit()
It also can be called with a list of objects; in this form, the commit operation will be limited only to the objects specified in the list, as well as any child objects within private
relationships for a delete operation:
# saves only user1 and address2. all other modified # objects remain present in the session. objectstore.get_session().commit(user1, address2)
This second form of commit should be used more carefully as it will not necessarily locate other dependent objects within the session, whose database representation may have foreign constraint relationships with the objects being operated upon.
The purpose of the Commit operation, as defined by the objectstore
package, is to instruct the Unit of Work to analyze its lists of modified objects, assemble them into a dependency graph, fire off the appopriate INSERT, UPDATE, and DELETE statements via the mappers related to those objects, and to synchronize column-based object attributes that correspond directly to updated/inserted database columns.
Its important to note that the objectstore.get_session().commit() operation is not the same as the commit() operation on SQLEngine. A SQLEngine
, described in None, has its own begin
and commit
statements which deal directly with transactions opened on DBAPI connections. While the session.commit()
makes use of these calls in order to issue its own SQL within a database transaction, it is only dealing with "committing" its own in-memory changes and only has an indirect relationship with database connection objects.
The session.commit()
operation also does not affect any relation
-based object attributes, that is attributes that reference other objects or lists of other objects, in any way. A brief list of what will not happen includes:
This means, if you set address.user_id
to 5, that integer attribute will be saved, but it will not place an Address
object in the addresses
attribute of the corresponding User
object. In some cases there may be a lazy-loader still attached to an object attribute which when first accesed performs a fresh load from the database and creates the appearance of this behavior, but this behavior should not be relied upon as it is specific to lazy loading and also may disappear in a future release. Similarly, if the Address
object is marked as deleted and a commit is issued, the correct DELETE statements will be issued, but if the object instance itself is still attached to the User
, it will remain.
So the primary guideline for dealing with commit() is, the developer is responsible for maintaining in-memory objects and their relationships to each other, the unit of work is responsible for maintaining the database representation of the in-memory objects. The typical pattern is that the manipulation of objects is the way that changes get communicated to the unit of work, so that when the commit occurs, the objects are already in their correct in-memory representation and problems dont arise. The manipulation of identifier attributes like integer key values as well as deletes in particular are a frequent source of confusion.
A terrific feature of SQLAlchemy which is also a supreme source of confusion is the backreference feature, described in None. This feature allows two types of objects to maintain attributes that reference each other, typically one object maintaining a list of elements of the other side, which contains a scalar reference to the list-holding object. When you append an element to the list, the element gets a "backreference" back to the object which has the list. When you attach the list-holding element to the child element, the child element gets attached to the list. This feature has nothing to do whatsoever with the Unit of Work. *
It is strictly a small convenience feature intended to support the developer's manual manipulation of in-memory objects, and the backreference operation happens at the moment objects are attached or removed to/from each other, independent of any kind of database operation. It does not change the golden rule, that the developer is reponsible for maintaining in-memory object relationships.
*
there is an internal relationship between two relations
that have a backreference, which state that a change operation is only logged once to the unit of work instead of two separate changes since the two changes are "equivalent", so a backreference does affect the information that is sent to the Unit of Work. But the Unit of Work itself has no knowledge of this arrangement and has no ability to affect it.
The delete call places an object or objects into the Unit of Work's list of objects to be marked as deleted:
# mark three objects to be deleted objectstore.get_session().delete(obj1, obj2, obj3) # commit objectstore.get_session().commit()
When objects which contain references to other objects are deleted, the mappers for those related objects will issue UPDATE statements for those objects that should no longer contain references to the deleted object, setting foreign key identifiers to NULL. Similarly, when a mapper contains relations with the private=True
option, DELETE statements will be issued for objects within that relationship in addition to that of the primary deleted object; this is called a cascading delete.
As stated before, the purpose of delete is strictly to issue DELETE statements to the database. It does not affect the in-memory structure of objects, other than changing the identifying attributes on objects, such as setting foreign key identifiers on updated rows to None. It has no effect on the status of references between object instances, nor any effect on the Python garbage-collection status of objects.
To clear out the current thread's UnitOfWork, which has the effect of discarding the Identity Map and the lists of all objects that have been modified, just issue a clear:
# via module objectstore.clear() # or via Session objectstore.get_session().clear()
This is the easiest way to "start fresh", as in a web application that wants to have a newly loaded graph of objects on each request. Any object instances created before the clear operation should either be discarded or at least not used with any Mapper or Unit Of Work operations (with the exception of import_instance()
), as they no longer have any relationship to the current Unit of Work, and their behavior with regards to the current session is undefined.
To assist with the Unit of Work's "sticky" behavior, individual objects can have all of their attributes immediately re-loaded from the database, or marked as "expired" which will cause a re-load to occur upon the next access of any of the object's mapped attributes. This includes all relationships, so lazy-loaders will be re-initialized, eager relationships will be repopulated. Any changes marked on the object are discarded:
# immediately re-load attributes on obj1, obj2 session.refresh(obj1, obj2) # expire objects obj1, obj2, attributes will be reloaded # on the next access: session.expire(obj1, obj2, obj3)
Expunge simply removes all record of an object from the current Session. This includes the identity map, and all history-tracking lists:
session.expunge(obj1)
Use expunge
when youd like to remove an object altogether from memory, such as before calling del
on it, which will prevent any "ghost" operations occuring when the session is committed.
The instancekey attribute placed on object instances is designed to work with objects that are serialized into strings and brought back again. As it contains no references to internal structures or database connections, applications that use caches or session storage which require serialization (i.e. pickling) can store SQLAlchemy-loaded objects. However, as mentioned earlier, an object with a particular database identity is only allowed to exist uniquely within the current unit-of-work scope. So, upon deserializing such an object, it has to "check in" with the current Session. This is achieved via the import_instance()
method:
# deserialize an object myobj = pickle.loads(mystring) # "import" it. if the objectstore already had this object in the # identity map, then you get back the one from the current session. myobj = session.import_instance(myobj)
Note that the import_instance() function will either mark the deserialized object as the official copy in the current identity map, which includes updating its instancekey with the current application's class instance, or it will discard it and return the corresponding object that was already present. Thats why its important to receive the return results from the method and use the result as the official object instance.
The "scope" of the unit of work commit can be controlled further by issuing a begin(). A begin operation constructs a new UnitOfWork object and sets it as the currently used UOW. It maintains a reference to the original UnitOfWork as its "parent", and shares the same identity map of objects that have been loaded from the database within the scope of the parent UnitOfWork. However, the "new", "dirty", and "deleted" lists are empty. This has the effect that only changes that take place after the begin() operation get logged to the current UnitOfWork, and therefore those are the only changes that get commit()ted. When the commit is complete, the "begun" UnitOfWork removes itself and places the parent UnitOfWork as the current one again. The begin() method returns a transactional object, upon which you can call commit() or rollback(). Only this transactional object controls the transaction - commit() upon the Session will do nothing until commit() or rollback() is called upon the transactional object.
# modify an object myobj1.foo = "something new" # begin trans = session.begin() # modify another object myobj2.lala = "something new" # only 'myobj2' is saved trans.commit()
begin/commit supports the same "nesting" behavior as the SQLEngine (note this behavior is not the original "nested" behavior), meaning that many begin() calls can be made, but only the outermost transactional object will actually perform a commit(). Similarly, calls to the commit() method on the Session, which might occur in function calls within the transaction, will not do anything; this allows an external function caller to control the scope of transactions used within the functions.
The UOW commit operation places its INSERT/UPDATE/DELETE operations within the scope of a database transaction controlled by a SQLEngine:
engine.begin() try: # run objectstore update operations except: engine.rollback() raise engine.commit()
If you recall from the Transactions section, the engine's begin()/commit() methods support reentrant behavior. This means you can nest begin and commits and only have the outermost begin/commit pair actually take effect (rollbacks however, abort the whole operation at any stage). From this it follows that the UnitOfWork commit operation can be nested within a transaction as well:
engine.begin() try: # perform custom SQL operations objectstore.commit() # perform custom SQL operations except: engine.rollback() raise engine.commit()
Sessions can be created on an ad-hoc basis and used for individual groups of objects and operations. This has the effect of bypassing the normal thread-local Session and explicitly using a particular Session:
# make a new Session with a global UnitOfWork s = objectstore.Session() # make objects bound to this Session x = MyObj(_sa_session=s) # perform mapper operations bound to this Session # (this function coming soon) r = MyObj.mapper.using(s).select_by(id=12) # get the session that corresponds to an instance s = objectstore.get_session(x) # commit s.commit() # perform a block of operations with this session set within the current scope objectstore.push_session(s) try: r = mapper.select_by(id=12) x = new MyObj() objectstore.commit() finally: objectstore.pop_session()
Sessions also now support a "nested transaction" feature whereby a second Session can use a different database connection. This can be used inside of a larger database transaction to issue commits to the database that will be committed independently of the larger transaction's status:
engine.begin() try: a = MyObj() b = MyObj() sess = Session(nest_on=engine) objectstore.push_session(sess) try: c = MyObj() objectstore.commit() # will commit "c" to the database, # even if the external transaction rolls back finally: objectstore.pop_session() objectstore.commit() # commit "a" and "b" to the database engine.commit() except: engine.rollback() raise
For users who want to make their own Session subclass, or replace the algorithm used to return scoped Session objects (i.e. the objectstore.get_session() method):
# make a new Session s = objectstore.Session() # set it as the current thread-local session objectstore.session_registry.set(s) # set the objectstore's session registry to a different algorithm def create_session(): """creates new sessions""" return objectstore.Session() def mykey(): """creates contextual keys to store scoped sessions""" return "mykey" objectstore.session_registry = sqlalchemy.util.ScopedRegistry(createfunc=create_session, scopefunc=mykey)
The objectstore module can log an extensive display of its "commit plans", which is a graph of its internal representation of objects before they are committed to the database. To turn this logging on:
# make an engine with echo_uow engine = create_engine('myengine...', echo_uow=True)
Commits will then dump to the standard output displays like the following:
Task dump: UOWTask(6034768) 'User/users/6015696' | |- Save elements |- Save: UOWTaskElement(6034800): User(6016624) (save) | |- Save dependencies |- UOWDependencyProcessor(6035024) 'addresses' attribute on saved User's (UOWTask(6034768) 'User/users/6015696') | |-UOWTaskElement(6034800): User(6016624) (save) | |- Delete dependencies |- UOWDependencyProcessor(6035056) 'addresses' attribute on User's to be deleted (UOWTask(6034768) 'User/users/6015696') | |-(no objects) | |- Child tasks |- UOWTask(6034832) 'Address/email_addresses/6015344' | | | |- Save elements | |- Save: UOWTaskElement(6034864): Address(6034384) (save) | |- Save: UOWTaskElement(6034896): Address(6034256) (save) | |---- | |----
The above graph can be read straight downwards to determine the order of operations. It indicates "save User 6016624, process each element in the 'addresses' list on User 6016624, save Address 6034384, Address 6034256".
This section details all the options available to Mappers, as well as advanced patterns.
To start, heres the tables we will work with again:
from sqlalchemy import * db = create_engine('sqlite://filename=mydb', echo=True) # a table to store users users = Table('users', db, Column('user_id', Integer, primary_key = True), Column('user_name', String(40)), Column('password', String(80)) ) # a table that stores mailing addresses associated with a specific user addresses = Table('addresses', db, Column('address_id', Integer, primary_key = True), Column('user_id', Integer, ForeignKey("users.user_id")), Column('street', String(100)), Column('city', String(80)), Column('state', String(2)), Column('zip', String(10)) ) # a table that stores keywords keywords = Table('keywords', db, Column('keyword_id', Integer, primary_key = True), Column('name', VARCHAR(50)) ) # a table that associates keywords with users userkeywords = Table('userkeywords', db, Column('user_id', INT, ForeignKey("users")), Column('keyword_id', INT, ForeignKey("keywords")) )
When creating relations on a mapper, most examples so far have illustrated the mapper and relationship joining up based on the foreign keys of the tables they represent. in fact, this "automatic" inspection can be completely circumvented using the primaryjoin and secondaryjoin arguments to relation, as in this example which creates a User object which has a relationship to all of its Addresses which are in Boston:
class User(object): pass class Address(object): pass Address.mapper = mapper(Address, addresses) User.mapper = mapper(User, users, properties={ 'boston_addreses' : relation(Address.mapper, primaryjoin= and_(users.c.user_id==Address.c.user_id, Addresses.c.city=='Boston')) })
Many to many relationships can be customized by one or both of primaryjoin and secondaryjoin, shown below with just the default many-to-many relationship explicitly set:
class User(object): pass class Keyword(object): pass Keyword.mapper = mapper(Keyword, keywords) User.mapper = mapper(User, users, properties={ 'keywords':relation(Keyword.mapper, primaryjoin=users.c.user_id==userkeywords.c.user_id, secondaryjoin=userkeywords.c.keyword_id==keywords.c.keyword_id ) })
The previous example leads in to the idea of joining against the same table multiple times. Below is a User object that has lists of its Boston and New York addresses, both lazily loaded when they are first accessed:
User.mapper = mapper(User, users, properties={ 'boston_addreses' : relation(Address.mapper, primaryjoin= and_(users.c.user_id==Address.c.user_id, Addresses.c.city=='Boston')), 'newyork_addresses' : relation(Address.mapper, primaryjoin= and_(users.c.user_id==Address.c.user_id, Addresses.c.city=='New York')), })
A complication arises with the above pattern if you want the relations to be eager loaded. Since there will be two separate joins to the addresses table during an eager load, an alias needs to be used to separate them. You can create an alias of the addresses table to separate them, but then you are in effect creating a brand new mapper for each property, unrelated to the main Address mapper, which can create problems with commit operations. So an additional argument use_alias can be used with an eager relationship to specify the alias to be used just within the eager query:
User.mapper = mapper(User, users, properties={ 'boston_addreses' : relation(Address.mapper, primaryjoin= and_(User.c.user_id==Address.c.user_id, Addresses.c.city=='Boston'), lazy=False, use_alias=True), 'newyork_addresses' : relation(Address.mapper, primaryjoin= and_(User.c.user_id==Address.c.user_id, Addresses.c.city=='New York'), lazy=False, use_alias=True), }) sqlu = User.mapper.select()
By default, mappers will not supply any ORDER BY clause when selecting rows. This can be modified in several ways.
A "default ordering" can be supplied by all mappers, by enabling the "default_ordering" flag to the engine, which indicates that table primary keys or object IDs should be used as the default ordering:
db = create_engine('postgres://username=scott&password=tiger', default_ordering=True)
The "order_by" parameter can be sent to a mapper, overriding the per-engine ordering if any. A value of None means that the mapper should not use any ordering, even if the engine's default_ordering property is True. A non-None value, which can be a column, an asc or desc clause, or an array of either one, indicates the ORDER BY clause that should be added to all select queries:
# disable all ordering mapper = mapper(User, users, order_by=None) # order by a column mapper = mapper(User, users, order_by=users.c.user_id) # order by multiple items mapper = mapper(User, users, order_by=[users.c.user_id, desc(users.c.user_name)])
"order_by" can also be specified to an individual select method, overriding all other per-engine/per-mapper orderings:
# order by a column l = mapper.select(users.c.user_name=='fred', order_by=users.c.user_id) # order by multiple criterion l = mapper.select(users.c.user_name=='fred', order_by=[users.c.user_id, desc(users.c.user_name)])
For relations, the "order_by" property can also be specified to all forms of relation:
# order address objects by address id mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses), order_by=addresses.c.address_id) }) # eager load with ordering - the ORDER BY clauses of parent/child will be organized properly mapper = mapper(User, users, properties = { 'addresses' : relation(mapper(Address, addresses), order_by=desc(addresses.c.email_address), eager=True) }, order_by=users.c.user_id)
You can limit rows in a regular SQL query by specifying limit and offset. A Mapper can handle the same concepts:
class User(object): pass m = mapper(User, users) sqlr = m.select(limit=20, offset=10)
class User(object): pass class Address(object): pass m = mapper(User, users, properties={ 'addresses' : relation(mapper(Address, addresses), lazy=False) }) r = m.select(User.c.user_name.like('F%'), limit=20, offset=10)
The main WHERE clause as well as the limiting clauses are coerced into a subquery; this subquery represents the desired result of objects. A containing query, which handles the eager relationships, is joined against the subquery to produce the result.
When mappers are constructed, by default the column names in the Table metadata are used as the names of attributes on the mapped class. This can be customzed within the properties by stating the key/column combinations explicitly:
user_mapper = mapper(User, users, properties={ 'id' : users.c.user_id, 'name' : users.c.user_name, })
In the situation when column names overlap in a mapper against multiple tables, columns may be referenced together with a list:
# join users and addresses usersaddresses = sql.join(users, addresses, users.c.user_id == addresses.c.user_id) m = mapper(User, usersaddresses, properties = { 'id' : [users.c.user_id, addresses.c.user_id], } )
This feature allows particular columns of a table to not be loaded by default, instead being loaded later on when first referenced. It is essentailly "column-level lazy loading". This feature is useful when one wants to avoid loading a large text or binary field into memory when its not needed. Individual columns can be lazy loaded by themselves or placed into groups that lazy-load together.
book_excerpts = Table('books', db, Column('book_id', Integer, primary_key=True), Column('title', String(200), nullable=False), Column('summary', String(2000)), Column('excerpt', String), Column('photo', Binary) ) class Book(object): pass # define a mapper that will load each of 'excerpt' and 'photo' in # separate, individual-row SELECT statements when each attribute # is first referenced on the individual object instance book_mapper = mapper(Book, book_excerpts, properties = { 'excerpt' : deferred(book_excerpts.c.excerpt), 'photo' : deferred(book_excerpts.c.photo) })
Deferred columns can be placed into groups so that they load together:
book_excerpts = Table('books', db, Column('book_id', Integer, primary_key=True), Column('title', String(200), nullable=False), Column('summary', String(2000)), Column('excerpt', String), Column('photo1', Binary), Column('photo2', Binary), Column('photo3', Binary) ) class Book(object): pass # define a mapper with a 'photos' deferred group. when one photo is referenced, # all three photos will be loaded in one SELECT statement. The 'excerpt' will # be loaded separately when it is first referenced. book_mapper = mapper(Book, book_excerpts, properties = { 'excerpt' : deferred(book_excerpts.c.excerpt), 'photo1' : deferred(book_excerpts.c.photo1, group='photos'), 'photo2' : deferred(book_excerpts.c.photo2, group='photos'), 'photo3' : deferred(book_excerpts.c.photo3, group='photos') })
The options method of mapper, first introduced in None, supports the copying of a mapper into a new one, with any number of its relations replaced by new ones. The method takes a variable number of MapperOption objects which know how to change specific things about the mapper. The five available options are eagerload, lazyload, noload, deferred and extension.
An example of a mapper with a lazy load relationship, upgraded to an eager load relationship:
class User(object): pass class Address(object): pass # a 'lazy' relationship User.mapper = mapper(User, users, properties = { 'addreses':relation(mapper(Address, addresses), lazy=True) }) # copy the mapper and convert 'addresses' to be eager eagermapper = User.mapper.options(eagerload('addresses'))
The load options also can take keyword arguments that apply to the new relationship. To take the "double" address lazy relationship from the previous section and upgrade it to eager, adding the "selectalias" keywords as well:
m = User.mapper.options( eagerload('boston_addresses', selectalias='boston_ad'), eagerload('newyork_addresses', selectalias='newyork_ad') )
The defer and undefer options can control the deferred loading of attributes:
# set the 'excerpt' deferred attribute to load normally m = book_mapper.options(undefer('excerpt')) # set the referenced mapper 'photos' to defer its loading of the column 'imagedata' m = book_mapper.options(defer('photos.imagedata'))
Options can also take a limited set of keyword arguments which will be applied to a new mapper. For example, to create a mapper that refreshes all objects loaded each time:
m2 = mapper.options(always_refresh=True)
Or, a mapper with different ordering:
m2 = mapper.options(order_by=[newcol])
Table Inheritance indicates the pattern where two tables, in a parent-child relationship, are mapped to an inheritance chain of classes. If a table "employees" contains additional information about managers in the table "managers", a corresponding object inheritance pattern would have an Employee class and a Manager class. Loading a Manager object means you are joining managers to employees. For SQLAlchemy, this pattern is just a special case of a mapper that maps against a joined relationship, and is provided via the inherits keyword.
class User(object): """a user object.""" pass User.mapper = mapper(User, users) class AddressUser(User): """a user object that also has the users mailing address.""" pass # define a mapper for AddressUser that inherits the User.mapper, and joins on the user_id column AddressUser.mapper = mapper( AddressUser, addresses, inherits=User.mapper ) items = AddressUser.mapper.select()
Above, the join condition is determined via the foreign keys between the users and the addresses table. To specify the join condition explicitly, use inherit_condition:
AddressUser.mapper = mapper( AddressUser, addresses, inherits=User.mapper, inherit_condition=users.c.user_id==addresses.c.user_id )
The more general case of the pattern described in "table inheritance" is a mapper that maps against more than one table. The join keyword from the SQL package creates a neat selectable unit comprised of multiple tables, complete with its own composite primary key, which can be passed in to a mapper as the table.
# a class class AddressUser(object): pass # define a Join j = join(users, addresses) # map to it - the identity of an AddressUser object will be # based on (user_id, address_id) since those are the primary keys involved m = mapper(AddressUser, j)
# many-to-many join on an association table j = join(users, userkeywords, users.c.user_id==userkeywords.c.user_id).join(keywords, userkeywords.c.keyword_id==keywords.c.keyword_id) # a class class KeywordUser(object): pass # map to it - the identity of a KeywordUser object will be # (user_id, keyword_id) since those are the primary keys involved m = mapper(KeywordUser, j)
Similar to mapping against a join, a plain select() object can be used with a mapper as well. Below, an example select which contains two aggregate functions and a group_by is mapped to a class:
s = select([customers, func.count(orders).label('order_count'), func.max(orders.price).label('highest_order')], customers.c.customer_id==orders.c.customer_id, group_by=[c for c in customers.c] ) class Customer(object): pass mapper = mapper(Customer, s)
Above, the "customers" table is joined against the "orders" table to produce a full row for each customer row, the total count of related rows in the "orders" table, and the highest price in the "orders" table, grouped against the full set of columns in the "customers" table. That query is then mapped against the Customer class. New instances of Customer will contain attributes for each column in the "customers" table as well as an "order_count" and "highest_order" attribute. Updates to the Customer object will only be reflected in the "customers" table and not the "orders" table. This is because the primary keys of the "orders" table are not represented in this mapper and therefore the table is not affected by save or delete operations.
By now it should be apparent that the mapper defined for a class is in no way the only mapper that exists for that class. Other mappers can be created at any time; either explicitly or via the options method, to provide different loading behavior.
However, its not as simple as that. The mapper serves a dual purpose; one is to generate select statements and load objects from executing those statements; the other is to keep track of the defined dependencies of that object when save and delete operations occur, and to extend the attributes of the object so that they store information about their history and communicate with the unit of work system. For this reason, it is a good idea to be aware of the behavior of multiple mappers. When creating dependency relationships between objects, one should insure that only the primary mappers are used in those relationships, else deep object traversal operations will fail to load in the expected properties, and update operations will not take all the dependencies into account.
Generally its as simple as, the first mapper that is defined for a particular class is the one that gets to define that classes' relationships to other mapped classes, and also decorates its attributes and constructors with special behavior. Any subsequent mappers created for that class will be able to load new instances, but object manipulation operations will still function via the original mapper. The special keyword is_primary will override this behavior, and make any mapper the new "primary" mapper.
class User(object): pass # mapper one - mark it as "primary", meaning this mapper will handle # saving and class-level properties m1 = mapper(User, users, is_primary=True) # mapper two - this one will also eager-load address objects in m2 = mapper(User, users, properties={ 'addresses' : relation(mapper(Address, addresses), lazy=False) }) # get a user. this user will not have an 'addreses' property u1 = m1.select(User.c.user_id==10) # get another user. this user will have an 'addreses' property. u2 = m2.select(User.c.user_id==27) # make some modifications, including adding an Address object. u1.user_name = 'jack' u2.user_name = 'jane' u2.addresses.append(Address('123 green street')) # upon commit, the User objects will be saved. # the Address object will not, since the primary mapper for User # does not have an 'addresses' relationship defined objectstore.commit()
Oftentimes it is necessary for two mappers to be related to each other. With a datamodel that consists of Users that store Addresses, you might have an Address object and want to access the "user" attribute on it, or have a User object and want to get the list of Address objects. The easiest way to do this is via the backreference keyword described in None. Although even when backreferences are used, it is sometimes necessary to explicitly specify the relations on both mappers pointing to each other.
To achieve this involves creating the first mapper by itself, then creating the second mapper referencing the first, then adding references to the first mapper to reference the second:
class User(object): pass class Address(object): pass User.mapper = mapper(User, users) Address.mapper = mapper(Address, addresses, properties={ 'user':relation(User.mapper) }) User.mapper.add_property('addresses', relation(Address.mapper))
Note that with a circular relationship as above, you cannot declare both relationships as "eager" relationships, since that produces a circular query situation which will generate a recursion exception. So what if you want to load an Address and its User eagerly? Just make a second mapper using options:
eagermapper = Address.mapper.options(eagerload('user')) s = eagermapper.select(Address.c.address_id==12)
A self-referential mapper is a mapper that is designed to operate with an adjacency list table. This is a table that contains one or more foreign keys back to itself, and is usually used to create hierarchical tree structures. SQLAlchemy's default model of saving items based on table dependencies is not sufficient in this case, as an adjacency list table introduces dependencies between individual rows. Fortunately, SQLAlchemy will automatically detect a self-referential mapper and do the extra lifting to make it work.
# define a self-referential table trees = Table('treenodes', engine, Column('node_id', Integer, primary_key=True), Column('parent_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True), Column('node_name', String(50), nullable=False), ) # treenode class class TreeNode(object): pass # mapper defines "children" property, pointing back to TreeNode class, # with the mapper unspecified. it will point back to the primary # mapper on the TreeNode class. TreeNode.mapper = mapper(TreeNode, trees, properties={ 'children' : relation( TreeNode, private=True ), } ) # or, specify the circular relationship after establishing the original mapper: mymapper = mapper(TreeNode, trees) mymapper.add_property('children', relation( mymapper, private=True ))
This kind of mapper goes through a lot of extra effort when saving and deleting items, to determine the correct dependency graph of nodes within the tree.
A self-referential mapper where there is more than one relationship on the table requires that all join conditions be explicitly spelled out. Below is a self-referring table that contains a "parent_node_id" column to reference parent/child relationships, and a "root_node_id" column which points child nodes back to the ultimate root node:
# define a self-referential table with several relations trees = Table('treenodes', engine, Column('node_id', Integer, primary_key=True), Column('parent_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True), Column('root_node_id', Integer, ForeignKey('treenodes.node_id'), nullable=True), Column('node_name', String(50), nullable=False), ) # treenode class class TreeNode(object): pass # define the "children" property as well as the "root" property TreeNode.mapper = mapper(TreeNode, trees, properties={ 'children' : relation( TreeNode, primaryjoin=trees.c.parent_node_id==trees.c.node_id private=True ), 'root' : relation( TreeNode, primaryjoin=trees.c.root_node_id=trees.c.node_id, foreignkey=trees.c.node_id, uselist=False ) } )
The "root" property on a TreeNode is a many-to-one relationship. By default, a self-referential mapper declares relationships as one-to-many, so the extra parameter foreignkey, pointing to the "many" side of a relationship, is needed to indicate a "many-to-one" self-referring relationship.
Both TreeNode examples above are available in functional form in the examples/adjacencytree directory of the distribution.
Take any result set and feed it into a mapper to produce objects. Multiple mappers can be combined to retrieve unrelated objects from the same row in one step. The instances method on mapper takes a ResultProxy object, which is the result type generated from SQLEngine, and delivers object instances.
class User(object): pass User.mapper = mapper(User, users) # select users c = users.select().execute() # get objects userlist = User.mapper.instances(c)
# define a second class/mapper class Address(object): pass Address.mapper = mapper(Address, addresses) # select users and addresses in one query s = select([users, addresses], users.c.user_id==addresses.c.user_id) # execute it, and process the results with the User mapper, chained to the Address mapper r = User.mapper.instances(s.execute(), Address.mapper) # result rows are an array of objects, one for each mapper used for entry in r: user = r[0] address = r[1]
Other arguments not covered above include:
_sa_entity='name'
to tag them to the appropriate mapper.Mappers can have functionality augmented or replaced at many points in its execution via the usage of the MapperExtension class. This class is just a series of "hooks" where various functionality takes place. An application can make its own MapperExtension objects, overriding only the methods it needs.
class MapperExtension(object): def create_instance(self, mapper, row, imap, class_): """called when a new object instance is about to be created from a row. the method can choose to create the instance itself, or it can return None to indicate normal object creation should take place. mapper - the mapper doing the operation row - the result row from the database imap - a dictionary that is storing the running set of objects collected from the current result set class_ - the class we are mapping. """ def append_result(self, mapper, row, imap, result, instance, isnew, populate_existing=False): """called when an object instance is being appended to a result list. If it returns True, it is assumed that this method handled the appending itself. mapper - the mapper doing the operation row - the result row from the database imap - a dictionary that is storing the running set of objects collected from the current result set result - an instance of util.HistoryArraySet(), which may be an attribute on an object if this is a related object load (lazy or eager). use result.append_nohistory(value) to append objects to this list. instance - the object instance to be appended to the result isnew - indicates if this is the first time we have seen this object instance in the current result set. if you are selecting from a join, such as an eager load, you might see the same object instance many times in the same result set. populate_existing - usually False, indicates if object instances that were already in the main identity map, i.e. were loaded by a previous select(), get their attributes overwritten """ def before_insert(self, mapper, instance): """called before an object instance is INSERTed into its table. this is a good place to set up primary key values and such that arent handled otherwise.""" def after_insert(self, mapper, instance): """called after an object instance has been INSERTed""" def before_delete(self, mapper, instance): """called before an object instance is DELETEed"""
To use MapperExtension, make your own subclass of it and just send it off to a mapper:
mapper = mapper(User, users, extension=MyExtension())
An existing mapper can create a copy of itself using an extension via the extension option:
extended_mapper = mapper.options(extension(MyExtension()))
This section is a quick summary of what's going on when you send a class to the mapper() function. This material, not required to be able to use SQLAlchemy, is a little more dense and should be approached patiently!
The primary changes to a class that is mapped involve attaching property objects to it which represent table columns. These property objects essentially track changes. In addition, the __init__ method of the object is decorated to track object creates.
Here is a quick rundown of all the changes in code form:
# step 1 - override __init__ to 'register_new' with the Unit of Work oldinit = myclass.__init__ def init(self, *args, **kwargs): nohist = kwargs.pop('_mapper_nohistory', False) oldinit(self, *args, **kwargs) if not nohist: # register_new with Unit Of Work objectstore.uow().register_new(self) myclass.__init__ = init # step 2 - set a string identifier that will # locate the classes' primary mapper myclass._mapper = mapper.hashkey # step 3 - add column accessor myclass.c = mapper.columns # step 4 - attribute decorating. # this happens mostly within the package sqlalchemy.attributes # this dictionary will store a series of callables # that generate "history" containers for # individual object attributes myclass._class_managed_attributes = {} # create individual properties for each column - # these objects know how to talk # to the attribute package to create appropriate behavior. # the next example examines the attributes package more closely. myclass.column1 = SmartProperty().property('column1', uselist=False) myclass.column2 = SmartProperty().property('column2', uselist=True)
The attribute package is used when save operations occur to get a handle on modified values. In the example below, a full round-trip attribute tracking operation is illustrated:
import sqlalchemy.attributes as attributes # create an attribute manager. # the sqlalchemy.mapping package keeps one of these around as # 'objectstore.global_attributes' manager = attributes.AttributeManager() # regular old new-style class class MyClass(object): pass # register a scalar and a list attribute manager.register_attribute(MyClass, 'column1', uselist=False) manager.register_attribute(MyClass, 'column2', uselist=True) # create/modify an object obj = MyClass() obj.column1 = 'this is a new value' obj.column2.append('value 1') obj.column2.append('value 2') # get history objects col1_history = manager.get_history(obj, 'column1') col2_history = manager.get_history(obj, 'column2') # whats new ? >>> col1_history.added_items() ['this is a new value'] >>> col2_history.added_items() ['value1', 'value2'] # commit changes manager.commit(obj) # the new values become the "unchanged" values >>> col1_history.added_items() [] >>> col1_history.unchanged_items() ['this is a new value'] >>> col2_history.added_items() [] >>> col2_history.unchanged_items() ['value1', 'value2']
The above AttributeManager also includes a method value_changed which is triggered whenever change events occur on the managed object attributes. The Unit of Work (objectstore) package overrides this method in order to receive change events; its essentially this:
import sqlalchemy.attributes as attributes class UOWAttributeManager(attributes.AttributeManager): def value_changed(self, obj, key, value): if hasattr(obj, '_instance_key'): uow().register_dirty(obj) else: uow().register_new(obj) global_attributes = UOWAttributeManager()
Objects that contain the attribute "_instance_key" are already registered with the Identity Map, and are assumed to have come from the database. They therefore get marked as "dirty" when changes happen. Objects without an "_instance_key" are not from the database, and get marked as "new" when changes happen, although usually this will already have occured via the object's __init__ method.
The package sqlalchemy.types
defines the datatype identifiers which may be used when defining metadata. This package includes a set of generic types, a set of SQL-specific subclasses of those types, and a small extension system used by specific database connectors to adapt these generic types into database-specific type objects.
SQLAlchemy comes with a set of standard generic datatypes, which are defined as classes.
The standard set of generic types are:
class String(TypeEngine): def __init__(self, length=None) class Integer(TypeEngine) class SmallInteger(Integer) class Numeric(TypeEngine): def __init__(self, precision=10, length=2) class Float(Numeric): def __init__(self, precision=10) # DateTime, Date, and Time work with Python datetime objects class DateTime(TypeEngine) class Date(TypeEngine) class Time(TypeEngine) class Binary(TypeEngine): def __init__(self, length=None) class Boolean(TypeEngine) # converts unicode strings to raw bytes # as bind params, raw bytes to unicode as # rowset values, using the unicode encoding # setting on the engine (defaults to 'utf-8') class Unicode(TypeDecorator) # uses the pickle protocol to serialize data # in/out of Binary columns class PickleType(TypeDecorator)
More specific subclasses of these types are available, which various database engines may choose to implement specifically, allowing finer grained control over types:
class FLOAT(Numeric) class TEXT(String) class DECIMAL(Numeric) class INT(Integer) INTEGER = INT class TIMESTAMP(DateTime) class DATETIME(DateTime) class CLOB(String) class VARCHAR(String) class CHAR(String) class BLOB(Binary) class BOOLEAN(Boolean)
When using a specific database engine, these types are adapted even further via a set of database-specific subclasses defined by the database engine. There may eventually be more type objects that are defined for specific databases. An example of this would be Postgres' Array type.
Type objects are specified to table meta data using either the class itself, or an instance of the class. Creating an instance of the class allows you to specify parameters for the type, such as string length, numerical precision, etc.:
mytable = Table('mytable', engine, # define type using a class Column('my_id', Integer, primary_key=True), # define type using an object instance Column('value', Number(7,4)) )
User-defined types can be created, to support either database-specific types, or customized pre-processing of query parameters as well as post-processing of result set data. You can make your own classes to perform these operations. To augment the behavior of a TypeEngine
type, such as String
, the TypeDecorator
class is used:
import sqlalchemy.types as types class MyType(types.TypeDecorator): """basic type that decorates String, prefixes values with "PREFIX:" on the way in and strips it off on the way out.""" impl = types.String def convert_bind_param(self, value, engine): return "PREFIX:" + value def convert_result_value(self, value, engine): return value[7:]
The Unicode
and PickleType
classes are instances of TypeDecorator
already and can be subclassed directly.
To build a type object from scratch, which will not have a corresponding database-specific implementation, subclass TypeEngine
:
import sqlalchemy.types as types class MyType(types.TypeEngine): def __init__(self, precision = 8): self.precision = precision def get_col_spec(self): return "MYTYPE(%s)" % self.precision def convert_bind_param(self, value, engine): return value def convert_result_value(self, value, engine): return value
the schema module provides the building blocks for database metadata. This means all the entities within a SQL database that we might want to look at, modify, or create and delete are described by these objects, in a database-agnostic way.
A structure of SchemaItems also provides a "visitor" interface which is the primary method by which other methods operate upon the schema. The SQL package extends this structure with its own clause-specific objects as well as the visitor interface, so that the schema package "plugs in" to the SQL package.
represents a column in a database table. this is a subclass of sql.ColumnClause and represents an actual existing table in the database, in a similar fashion as TableClause/Table.
A plain default value on a column. this could correspond to a constant, a callable function, or a SQL clause.
defines a ForeignKey constraint between two columns. ForeignKey is specified as an argument to a Column object.
Represents an index of columns from a database table
a default that takes effect on the database side
a factory object used to create implementations for schema objects. This object is the ultimate base class for the engine.SQLEngine class.
represents a sequence, which applies to Oracle and Postgres databases.
represents a relational database table. This subclasses sql.TableClause to provide a table that is "wired" to an engine. Whereas TableClause represents a table as its used in a SQL expression, Table represents a table as its created in the database. Be sure to look at sqlalchemy.sql.TableImpl for additional methods defined on a Table.
Defines the SQLEngine class, which serves as the primary "database" object used throughout the sql construction and object-relational mapper packages. A SQLEngine is a facade around a single connection pool corresponding to a particular set of connection parameters, and provides thread-local transactional methods and statement execution methods for Connection objects. It also provides a facade around a Cursor object to allow richer column selection for result rows as well as type conversion operations, known as a ResultProxy.
A SQLEngine is provided to an application as a subclass that is specific to a particular type of DBAPI, and is the central switching point for abstracting different kinds of database behavior into a consistent set of behaviors. It provides a variety of factory methods to produce everything specific to a certain kind of database, including a Compiler, schema creation/dropping objects.
The term "database-specific" will be used to describe any object or function that has behavior corresponding to a particular vendor, such as mysql-specific, sqlite-specific, etc.
represents a a handle to the SQLEngine's connection pool. the default SQLSession maintains a distinct connection during transactions, otherwise returns connections newly retrieved from the pool each time. the Pool is usually configured to have use_threadlocal=True so if a particular connection is already checked out, youll get that same connection in the same thread. There can also be a "unique" SQLSession pushed onto the engine, which returns a connection via the unique_connection() method on Pool; this allows nested transactions to take place, or other operations upon more than one connection at a time.`
The central "database" object used by an application. Subclasses of this object is used by the schema and SQL construction packages to provide database-specific behaviors, as well as an execution and thread-local transaction context. SQLEngines are constructed via the create_engine() function inside this package.
wraps a DBAPI cursor object to provide access to row columns based on integer position, case-insensitive column name, or by schema.Column object. e.g.: row = fetchone()
col1 = row[0] # access via integer position
col2 = row['col2'] # access via name
col3 = row[mytable.c.mycol] # access via Column object. ResultProxy also contains a map of TypeEngine objects and will invoke the appropriate convert_result_value() method before returning columns.
defines the base components of SQL expression trees.
represents a dictionary/iterator of bind parameter key names/values. Includes parameters compiled with a Compiled object as well as additional arguments passed to the Compiled object's get_params() method. Parameter values will be converted as per the TypeEngine objects present in the bind parameter objects. The non-converted value can be retrieved via the get_original method. For Compiled objects that compile positional parameters, the values() iteration of the object will return the parameter values in the correct order.
represents a compiled SQL expression. the __str__ method of the Compiled object should produce the actual text of the statement. Compiled objects are specific to the database library that created them, and also may or may not be specific to the columns referenced within a particular set of bind parameters. In no case should the Compiled object be dependent on the actual values of those bind parameters, even though it may reference those values as defaults.
base class for elements of a programmatically constructed SQL expression.
represents a textual column clause in a SQL statement. May or may not be bound to an underlying Selectable.
provides a connection pool implementation, which optionally manages connections on a thread local basis. Also provides a DBAPI2 transparency layer so that pools can be managed automatically, based on module type and connect arguments, simply by calling regular DBAPI connect() methods.
proxies a DBAPI2 connect() call to a pooled connection keyed to the specific connect parameters.
uses Queue.Queue to maintain a fixed-size list of connections.
Maintains one connection per each thread, never moving to another thread. this is used for SQLite and other databases with a similar restriction.
the mapper package provides object-relational functionality, building upon the schema and sql packages and tying operations to class properties and constructors.
Persists object instances to and from schema.Table objects via the sql package. Instances of this class should be constructed through this package's mapper() or relation() function.
encapsulates the object-fetching operations provided by Mappers.
provides the Session object and a function-oriented convenience interface. This is the "front-end" to the Unit of Work system in unitofwork.py. Issues of "scope" are dealt with here, primarily through an important function "get_session()", which is where mappers and units of work go to get a handle on the current threa-local context.
returned by Session.begin(), denotes a transactionalized UnitOfWork instance. call commit() on this to commit the transaction.
raised for all those conditions where invalid arguments are sent to constructed objects. This error generally corresponds to construction time state errors.
corresponds to internal state being detected in an invalid state
sqlalchemy was asked to do something it cant do, return nonexistent data, etc. This error generally corresponds to runtime state errors.
raised when the execution of a SQL statement fails. includes accessors for the underlying exception, as well as the SQL and bind parameters
An SQLEngine proxy that automatically connects when necessary.
SQLEngine proxy. Supports lazy and late initialization by delegating to a real engine (set with connect()), and using proxy classes for TypeEngine.