![]() |
![]() |
Cheshire3 Installation |
The following instructions will hopefully walk you through installing Cheshire3 and its prerequisites from scratch under Linux (or any Unix). If you have troubles at any stage, feel free to contact us.
This is the easy, preferred, method for installing Cheshire3.
These are the requirements for Cheshire3, if you want to install everything by hand.
The links below are for if you want to check if there's a more recent version than the one we have in our build packages. Note that these will not have been tested, won't necessarily be supported, but might have useful bug fixes.
Minimum | Current | Location | Note |
---|---|---|---|
expat 1.95.8 | 1.95.8 | http://sourceforge.net/projects/expat/ | (see note) |
BerkeleyDB 4.0 | 4.4.20 | http://www.sleepycat.com/ | |
Python 2.3.0 | 2.4.3 | http://www.python.org/ | |
4Suite 1.0a3-cvs | 1.0b3 | http://sourceforge.net/projects/foursuite/ | |
ZSI 1.5-cvs | 1.7 | http://sourceforge.net/projects/pywebsvcs/ | 2.0 not supported |
PyZ3950 | 2.06 | http://www.panix.com/~asl2/software/PyZ3950/ | |
python-dateutil 0.9 | 1.1 | http://labix.org/python-dateutil | |
SRW 1.1 | 1.1-3 | http://srw.cheshire3.org/downloads/ | |
libxml2 2.6.10 | 2.6.26 | http://www.xmlsoft.org/ | |
libxslt 1.1.8 | 1.1.17 | http://www.xmlsoft.org/ | |
lxml 1.0.1 | 1.0.3 | http://codespeak.net/lxml/ | |
numarray 1.5.1 | 1.5.1 | http://www.stsci.edu/resources/software_hardware/numarray | |
TextIndexNG 2.1 | 3.1.9 | http://www.zopyx.com/OpenSource/TextIndexNG | |
apache 2.0.42 | 2.0.58 | http://httpd.apache.org/ | |
mod_python 3.1.0 | 3.2.10 | http://www.modpython.org/ | |
Cheshire3 0.9 | 0.9.9 | http://www.cheshire3.org/ |
To ensure that you have the most recent version of these instructions, it is suggested that you also follow along in the build.sh script.
Below is a set of instructions to install all of the requirements in user space, rather than globally. If you want to install globally omit the --prefix from the configurations. The example location is '/home/cheshire/install' and the source is being decompressed in /home/cheshire/build
Before embarking on the process below, you'll need to have a C compiler (we strongly recommend GCC) and make utility installed along with the appropriate libraries. You probably already do but just in case, these utilities can often be found under 'Development Tools' in package management applications provided in *nix distributions.
If you don't install everything in one session, you'll need to ensure that the environment variables are reset:
export CPPFLAGS=-I/home/cheshire/install/include
export LDFLAGS=-L/home/cheshire/install/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/cheshire/install/lib
Expat is the XML parser library that everything links to, so you'll need to install this first. Libxml2 is an alternative parser, but you'll need expat regardless as it's linked by Apache, Python and 4Suite.
./configure --prefix=/home/cheshire/install
make
make install
Python 2.4+ and 4Suite 1.0a4+ both include expat version 1.95.8. Previously the included versions were different and this could cause problems running under Apache. The 'minimum' versions have now been updated to these, but if you want to hack in the same version of expat to Python/4Suite or not run under Apache, then previous versions will be okay. You do not need to install this package if it is already present on your system. Most *nix distributions include Expat. Check that the version installed is 1.95.8, if not you probably want to install 1.95.8 globally rather than local to cheshire.
BerkeleyDB is a very fast transactional database system. It's used by such giants as Ebay, Amazon and IBM. For the purposes of Cheshire3, it is 10+ times faster than a relational database used for the same job. It's used by all of the Store interfaces (indexes, records, configurations and other objects).
cd build_unix
../dist/configure --prefix=/home/cheshire/install
make
make install
BerkeleyDB is generally present in most Linux systems. If there is a version 4 or greater, then this is unnecessary.
Python is the language that all of the main operational functions are written in, as opposed to the raw number crunching which is mostly done by C libraries. It's easy to understand and maintain, enabling developers to get right into the nitty gritty if desired, but without significant sacrifices to performance.
MacOSX note: use --enable-framework in the configure to build as a framework.
./configure --prefix=/home/cheshire/install
make
make install
4Suite is the best XML processing library available at the current time for Python. We use it for XPath and XSLT processing, as well as most DOM creation. See libxml2 for an alternative, but currently it does not support SAX2 under Python, nor does it produce unicode objects, just strings.
python ./setup.py build
python ./setup.py install
ZSI is the best Python SOAP toolkit. The most recent version (1.5) comes with a WSDL compiler, however it's not yet quite up to SRW.
The most recent version of ZSI either requires PyXML to be installed (which is not otherwise required for Cheshire3, and can conflict with 4Suite which is a better package) or to use the CVS version. The CVS version is available in the Cheshire3 FTP site.
python ./setup.py build
python ./setup.py install
TextIndexNG includes a wrapper around the Snowball stemming language. It provides interfaces to stemmers which reduce a word down to its lexical stem, eg 'princesses' to 'princess' or 'understanding' to 'understand'. TextIndexNG comes with a lot of other code, which is also installed, but not used. The author of the package was approached regarding splitting out the stemming library, but was not amenable to the suggestion.
python ./setup.py install
This package is required even if you don't want to enable Z39.50 interfaces as it contains the CQL libraries used in all C3 queries. It's also used by the Z3950SearchDocumentStream as a Z client.
You may need to install lex and yacc by hand first, as these are required to build the ASN.1 compiler.
cp lex.py yacc.py /home/cheshire/install/lib/python2.4/site-packages/
python ./setup.py install
This very small package contains the stubs for ZSI as well as a quick SRW demo client in python. If you're not going to enable an SRW/U interface, or the SRWSearchDocumentStream, you can omit it.
python ./setup.py install
The DateUtils code provides an excellent free text date parser (though doesn't currently handle multiple dates in the same block of text)
python ./setup.py install
This library is faster than expat and comes with its own XPath and XSLT implementations. It's not required if you don't feel like it, but it does parse XML really fast!
./configure --prefix=/home/cheshire/install --with-python
make
make install
cd python
python ./setup.py install
A companion library to libxml2 to process XSLT.
./configure --prefix=/home/cheshire/install --with-python
make
make install
Try to make sure that it links the version of expat you just installed by checking the output of configure to see whereabouts it's linking. Apache is only required if you want to have a remote interface to the Cheshire3 databases, eg by SRW/U, OAI or Z39.50. However using regular CGI calls rather than mod_python (below) handlers will be much slower as the infrastructure takes a second or so to configure and instantiate.
export CPPFLAGS=-I/home/cheshire/install/include
export LDFLAGS=-L/home/cheshire/install/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/cheshire/install/lib
./configure --prefix=/home/cheshire/install --enable-mods=all
--with-berkeley-db=/home/cheshire/install --enable-suexec
make
make install
Apache is also generally present in most systems, however you must ensure that it is run with the right environment variable so that it will link against the libraries that have been installed. Also, you'll need to ensure that the user which Apache is run as has read (and potentially write) access to the databases which the index and record data is maintained in.
If you haven't installed Apache, you can skip this section. Mod_python allows Apache to run python code internally to handle connections and requests. Each apache thread gets its own python interpreter which is only started once and left running. This means that the Cheshire3 architecture only needs to be built once, rather than per invocation.
./configure --prefix=/home/cheshire/install --with-python=/home/cheshire/install/bin/python2.4 --with-apxs=/home/cheshire/install/bin/apxs
make
make install
The Parallel Virtual Machine library is a very fast, low transaction cost parallelization system. It lets you run processes on multiple machines and compiles for multiple platforms. As Python and hence Cheshire3 will run on multiple platforms without any additional effort this means that you can build a completely heterogeneous cluster without any difficulties.
[Coming]
This is the Python wrapper around the PVM library.
[Coming]
If you run into issues with the 'sort' utility breaking, check you have the latest version of textutils installed.
./configure --prefix=/home/cheshire/install
make
make install
Environment variables required if you have installed this as a local user.
export LD_LIBRARY_PATH=/home/cheshire/install/lib
export LD_RUN_PATH=/home/cheshire/install/lib
Also ensure that Apache is run with these environment variables (envvars / envvars-std files with httpd binary)
Include conf/cheshire3.conf
And then in cheshire3.conf:
# Load mod_python LoadModule python_module modules/mod_python.so # SRW/U interface at /srw/dbname <Directory /home/cheshire/install/htdocs/srw> SetHandler mod_python PythonDebug On PythonPath "['/home/cheshire/cheshire3/code']+sys.path" PythonHandler srwApacheHandler </Directory> # Z3950 interface on 2100 Listen 2100 <VirtualHost *:2100> PythonPath "['/home/cheshire/cheshire3/code']+sys.path" PythonConnectionHandler zApacheHandler PythonDebug On </VirtualHost>
See further documentation.
If you get strange errors from mod_python under Linux, first trying restarting Apache.
If this fails with a No space left on device
error when there obviously
is space, then you've hit the semaphore problem.
The fix is:
echo "512 32000 32 512" > /proc/sys/kernel/sem
Or see:
http://clarens.sourceforge.net/index.php?docs+faq