Module baseObjects
[hide private]
[frames] | no frames]

Source Code for Module baseObjects

  1  # Version 0.9.1 
  2   
  3  from configParser import C3Object 
  4   
5 -class Server(C3Object):
6 """A Server object is a collection point for other objects, and an 7 initial entry into the system for requests from a 8 ProtocolHandler. A server might know about several Databases, 9 RecordStores and so forth, but its main function is to check 10 whether the request should be accepted or not and create an 11 environment in which the request can be processed. It will likely 12 have access to a UserStore database which maintains authentication 13 and authorisation information. The exact nature of this 14 information is not defined, allowing many possible backend 15 implementations. Servers are the top level of configuration for 16 the system and hence their constructor requires the path to a 17 local XML configuration file, however from then on configuration 18 information may be retrieved from other locations such as a remote 19 datastore to enable distributed environments to maintain 20 synchronicity.""" 21 22 databases = {} 23 authStore = None 24 resultSetStore = None 25 queryStore = None 26
27 - def __init__(self, session, configFile="serverConfig.xml"):
28 """The constructer takes a Session object and a file path to 29 the configuration file to be parsed and processed. 30 """ 31 raise(NotImplementedError)
32
33 - def connect(self, session, connection):
34 """Accept a connection to the server and perform any per session initialisation""" 35 raise(NotImplementedError)
36
37 - def disconnect(self, session, req):
38 """Called when a connection is dropped to perform any end of session tidying.""" 39 raise(NotImplementedError)
40
41 - def authenticate(self, session, req):
42 "Handle authentication of session/request and establish user profile." 43 raise(NotImplementedError)
44 45
46 -class Database(C3Object):
47 """A Database is a collection of Records and Indexes. It is 48 responsable for maintaining and allowing access to its components, 49 as well as metadata associated with the collections. It must be 50 able to interpret a request, splitting it amongst its known 51 resources and then recombine the values into a single response.""" 52 indexes = {} 53 protocolMaps = {} 54 recordStore = None 55
56 - def add_record(self, session, record):
57 """Ensure that a record is registered with the database. This 58 function does not ensure persistence of the record, not index 59 it, just perform registration""" 60 raise(NotImplementedError)
61 - def remove_record(self, session, req):
62 """Unregister the record.""" 63 raise(NotImplementedError)
64
65 - def index_record(self, session, record):
66 """Sends the record to all indexes registered with the database to be indexes""" 67 raise(NotImplementedError)
68 - def reindex(self, session):
69 """ Reindex all records registered with the database""" 70 raise(NotImplementedError)
71 - def unindex_record(self, session, record):
72 """Sends the record to all indexes registered with the database to be removed/unindexed""" 73 raise(NotImplementedError)
74
75 - def begin_indexing(self, session):
76 """Perform tasks before records are to be indexed""" 77 raise(NotImplementedError)
78 - def commit_indexing(self, session):
79 """Perform tasks after records have been sent to indexes. For 80 example, commit any temporary data to IndexStores""" 81 raise(NotImplementedError)
82 - def commit_metadata(self, session):
83 """Ensure persistence of database metadata""" 84 raise(NotImplementedError)
85
86 - def scan(self, session, query, numReq, direction=">="):
87 """Given a CQL query (single searchClause), resolve the index 88 and return a section of the term list""" 89 raise(NotImplementedError)
90 - def search(self, session, query):
91 """ Given a CQL query, perform the query and return a resultSet""" 92 raise(NotImplementedError)
93 - def sort(self, session, resultSets, sortKeys):
94 """ Take one or more resultSets and sort/merge by sortKeys """ 95 raise(NotImplementedError)
96
97 - def authenticate(self, session, user):
98 """ Authenticate user against database """ 99 raise(NotImplementedError)
100 - def connect(self, session):
101 """ Perform database specific session initialisation """ 102 raise(NotImplementedError)
103 - def disconnect(self, session):
104 """ Perform any database specific session tidying on completion """ 105 raise(NotImplementedError)
106 107 108
109 -class Index(C3Object):
110 """An Index is an object which defines an access point into 111 records and is responsable for extracting that information from 112 them. It can then store the information extracted in an 113 IndexStore. The entry point can be defined using one or more XPath 114 expressions, and the extraction process can be defined using a 115 workflow chain of standard objects. These chains must start with 116 an Extractor, but from there might then include PreParsers, 117 Parsers, Transformers, Normalizers and even other Indexes. The 118 index can also be the last object in a regular Workflow, so long 119 as an XPath object is used to find the data in the record 120 immediately before an Extractor.""" 121 indexStore = None 122
123 - def begin_indexing(self, session):
124 """ Perform tasks before indexing any records """ 125 raise(NotImplementedError)
126 - def commit_indexing(self, session):
127 """ Perform tasks after records have been indexed """ 128 raise(NotImplementedError)
129 - def index_record(self, session, record):
130 """ Accept a record to index. If begin indexing has been 131 called, the index might not commit any data until 132 commit_indexing is called. If it is not in batch mode, then 133 index_record will also commit the terms to the indexStore.""" 134 raise(NotImplementedError)
135 - def delete_record(self, session, record):
136 """ Delete all the terms of the given record from the indexes. 137 Does this by extracting the terms from the record, finding and 138 removing them. Hence the record must be the same as the one 139 that was indexed.""" 140 raise(NotImplementedError)
141
142 - def scan(self, session, clause):
143 """Produce an ordered term list with document frequencies and total occurences """ 144 raise(NotImplementedError)
145 - def search(self, session, clause):
146 """Search this particular index given a CQL clause, return a resultSet object""" 147 raise(NotImplementedError)
148 - def sort(self, session, resultSet):
149 """Sort a result set based on the values extracted according to this index.""" 150 raise(NotImplementedError)
151
152 - def serialise_terms(self, session, termId, terms, recs=0, occs=0):
153 """Callback from IndexStore to serialise list of terms and 154 document references to be stored""" 155 raise(NotImplementedError)
156 - def deserialise_terms(self, session, data):
157 """Callback from IndexStore to take serialised data and 158 produce list of terms and document references.""" 159 raise(NotImplementedError)
160 - def merge_terms(self, session, structTerms, newTerms, op="replace", recs=0, occs=0):
161 """Callback from IndexStore to take two sets of terms and merge them together.""" 162 raise(NotImplementedError)
163 - def construct_item(self, session, term, rsitype=""):
164 """Take a single item, as stored in this Index, and produce a ResultSetItem representation.""" 165 raise(NotImplementedError)
166 - def construct_resultSet(self, session, terms, queryHash={}):
167 """Take a list of terms and produce an appropriate ResultSet object.""" 168 raise(NotImplementedError)
169 170 171
172 -class XPathProcessor(C3Object):
173 """An XPathProcessor is a simple wrapper around an XPath. It is used 174 to evaluate the XPath expression according to a given record in 175 workflows"""
176 - def process_record(self, session, record):
177 """Process the XPath for the given record and return the results.""" 178 raise(NotImplementedError)
179 180 181 # Takes some data, returns a list of extracted values
182 -class Extractor(C3Object):
183 """An Extractor is a processing object called by an Index with the 184 value of an evaluated XPath expression or with a string. Example 185 normalizers might extract keywords from an element or the entire 186 contents thereof as a single string. Extractors must also be used 187 on the query terms to apply the same keyword processing rules, for 188 example."""
189 - def process_string(self, session, data):
190 """Process a raw string, eg from an attribute value or the query.""" 191 raise(NotImplementedError)
192 - def process_node(self, session, data):
193 """Process a DOM node.""" 194 raise(NotImplementedError)
195 - def process_eventList(self, session, data):
196 """Process a list of SAX events serialised in C3 internal format.""" 197 raise(NotImplementedError)
198 - def process_xpathResult(self, session, data):
199 """Process the result of an XPath expression. Convenience 200 function to wrap the other process_* functions and do type 201 checking.""" 202 raise(NotImplementedError)
203 204 # Takes a string, returns a list of normalised values
205 -class Normalizer(C3Object):
206 """Normalizer objects are chained after Extractors in order to 207 transform the data from the record or query. A Normalizer might 208 standardise case, perform stemming or transform a date into 209 ISO8601 format. Normalizers are also needed to transform the terms 210 in a request into the same format as the term in the Index. For 211 example a date index might be searched using a free text date and 212 that would need to be parsed into the normalised form in order to 213 compare it with the stored data.""" 214
215 - def process_string(self, session, data):
216 """Process a string into an alternative form.""" 217 raise(NotImplementedError)
218
219 - def process_hash(self, session, data):
220 """Process a hash of values into alternative forms.""" 221 raise(NotImplementedError)
222 223 # Takes raw data, returns one or more Documents 224
225 -class DocumentFactory(C3Object):
226 """ Object Docs """
227 - def load(self, session, data, cache=None, format=None, tag=None, documentType=None, codec=""):
228 raise(NotImplementedError)
229 - def get_document(self, session, idx=-1):
230 raise(NotImplementedError)
231
232 -class QueryFactory(C3Object):
233 pass
234 235 236 # Takes a Document, returns a Record
237 -class Parser(C3Object):
238 """Normally a simple wrapper around an XML parser, these objects 239 can be viewed as Record Factories. They take a Document containing 240 some XML and produce the equivalent Record."""
241 - def process_document(self, session, doc):
242 """Take a Document, parse it and return a Record object.""" 243 raise(NotImplementedError)
244 245 # Takes a Document, returns a Document
246 -class PreParser(C3Object):
247 """A PreParser takes a Document and creates a second one. For 248 example, the input document might consist solely of a URL. The 249 output would be a Document with the data that the PreParser has 250 fetched from that address. This functionality allows for work flow 251 chains to be strung together in many ways, and perhaps in ways 252 which the original implemention had not foreseen."""
253 - def process_document(self, session, doc):
254 """Take a Document, transform it and return a new Document object.""" 255 raise(NotImplementedError)
256 257 258 # Takes a Record, returns a Document
259 -class Transformer(C3Object):
260 """A Transformer is the opposite of a Parser. It takes a Record 261 and produces a Document. In many cases this can be handled by an 262 XSLT implementation but other instances might include one that 263 returns a binary file based on the information in the 264 record. Transformers might be used in an indexing chain, but are 265 more likely to be used to render a record in a format or schema 266 requested by the end user."""
267 - def process_record(self, session, rec):
268 """Take a Record, transform it and return a new Document object.""" 269 raise(NotImplementedError)
270 271 272 # Users instantiated out of AuthStore like any other configured object
273 -class User(C3Object):
274 """An object representing a user of the system to allow for 275 convenient access to properties such as username, password and 276 rights metadata.""" 277 278 username = '' 279 password = '' 280 rights = [] 281 email = '' 282 realName = '' 283 flags = [] 284
285 - def hasFlag(self, session, flag, object=None):
286 """Check whether or not the user has the specified flag. This 287 flag may be set regarding a particular object, for example 288 write access to a particular store.""" 289 raise(NotImplementedError)
290 291
292 -class ProtocolMap(C3Object):
293 """Protocol maps map from an incoming query type to internal indexes based on some specification.""" 294 protocol = ""
295 - def resolve_index(self, session, data):
296 """Given a query, resolve it and return the index object to be used.""" 297 raise(NotImplementedError)
298 299 300 # --- Store APIs --- 301 302 # Not an object store, we just look after terms and indexes
303 -class IndexStore(C3Object):
304 """A persistent storage mechanism for extracted terms.""" 305
306 - def begin_indexing(self, session, req):
307 """Perform tasks as required before indexing begins, for example setting batch files.""" 308 raise(NotImplementedError)
309 - def commit_indexing(self, session, req):
310 """Commit the data to disk from the indexing process.""" 311 raise(NotImplementedError)
312
313 - def contains_index(self, session, index):
314 """Does the IndexStore currently store the given Index.""" 315 raise(NotImplementedError)
316 - def create_index(self, session, index):
317 """Create an index in the store.""" 318 raise(NotImplementedError)
319 - def clean_index(self, session, index):
320 """Remove all the terms from an index, but keep the specification.""" 321 raise(NotImplementedError)
322 - def delete_index(self, session, index):
323 """Completely delete an index from the store.""" 324 raise(NotImplementedError)
325
326 - def fetch_indexList(self, session):
327 """Fetch a list of all indexes stored in this IndexStore.""" 328 raise(NotImplementedError)
329 - def fetch_indexStats(self, session, index):
330 """Fetch statistics (such as size) about the given Index.""" 331 raise(NotImplementedError)
332
333 - def delete_terms(self, session, index, terms, record=None):
334 """Delete the given terms from an index, optionally only for a particular record.""" 335 raise(NotImplementedError)
336 - def store_terms(self, session, index, terms, record):
337 """Store terms in the index for a given record.""" 338 raise(NotImplementedError)
339 - def fetch_term(self, session, index, term, summary=0, reverse=0):
340 """Fetch a single term.""" 341 raise(NotImplementedError)
342 - def fetch_termList(self, session, index, term, numReq=0, relation="", end="", summary=0, reverse=0):
343 """Fetch a list of terms for an index. 344 numReq: How many terms are wanted. 345 relation: Which order to scan through the index. 346 end: A point to end at (eg between A and B) 347 summary: Only return frequency info, not the pointers to matching records. 348 reverse: Use the reversed index if available (eg 'xedni' not 'index'). 349 """ 350 raise(NotImplementedError)
351 - def fetch_sortValue(self, session, index, record):
352 """Fetch a stored data value for the given record.""" 353 raise(NotImplementedError)
354 355 356 # Store configured objects
357 -class ObjectStore(C3Object):
358 """An interface to a persistent storage mechanism for configured Cheshire3 objects."""
359 - def create_object(self, session, object=None):
360 """Given a Cheshire3 object, create a serialised form of it in the database. 361 Note: You should use create_record() as per RecordStore to create an object from a configuration. 362 """ 363 raise(NotImplementedError)
364 - def delete_object(self, session, id):
365 """Delete an object.""" 366 raise(NotImplementedError)
367 - def fetch_object(self, session, id):
368 """Fetch an object.""" 369 raise(NotImplementedError)
370 - def store_object(self, session, object):
371 """Store an object, potentially overwriting an existing copy.""" 372 raise(NotImplementedError)
373 374 # Store CQL queries
375 -class QueryStore(ObjectStore):
376 """An interface to persistent storage for Queries."""
377 - def create_query(self, session, query=None):
378 """Create a new query in the store.""" 379 raise(NotImplementedError)
380 - def delete_query(self, session, id):
381 """Delete a query from the store.""" 382 raise(NotImplementedError)
383 - def fetch_query(self, session, id):
384 """Fetch a query from the store.""" 385 raise(NotImplementedError)
386 - def fetch_queryList(self, session, req):
387 """Fetch a list of identifiers associated with queries in the store.""" 388 raise(NotImplementedError)
389 - def store_query(self, session, object):
390 """Store a query, potentially overwriting an existing copy.""" 391 raise(NotImplementedError)
392 393 394 # Store records
395 -class RecordStore(ObjectStore):
396 """A persistent storage mechanism for Records. It allows such 397 operations as create, update, fetch and delete. It also allows 398 fast retrieval of important record metadata, for use with 399 computing relevance ranking for example.""" 400
401 - def create_record(self, session, record=None):
402 """Create a record and assign it a new identifier.""" 403 raise(NotImplementedError)
404 - def replace_record(self, session, record):
405 """Permissions checking hook for store_record if the record does not already exist.""" 406 raise(NotImplementedError)
407 - def store_record(self, session, record):
408 """Store an existing record. It must already have an identifier assigned to it.""" 409 raise(NotImplementedError)
410 - def fetch_record(self, session, id):
411 """Return the record with the given identifier.""" 412 raise(NotImplementedError)
413 - def delete_record(self, session, id):
414 """Delete the record with the given identifier from storage.""" 415 raise(NotImplementedError)
416 - def fetch_recordSize(self, session, id):
417 """Return the size of the record, according to its metadata.""" 418 raise(NotImplementedError)
419 - def fetch_recordChecksum(self, session, id):
420 """Return a checksum for the record, if one is available.""" 421 raise(NotImplementedError)
422 423 # Store documents
424 -class DocumentStore(ObjectStore):
425 """An interface to a persistent storage mechanism for Documents and their associated metadata."""
426 - def create_document(self, session, doc=None):
427 raise(NotImplementedError)
428 - def delete_document(self, session, id):
429 raise(NotImplementedError)
430 - def fetch_document(self, session, id):
431 raise(NotImplementedError)
432 - def fetch_documentList(self, session, req):
433 raise(NotImplementedError)
434 - def store_document(self, session, doc):
435 raise(NotImplementedError)
436 437 # And store result sets
438 -class ResultSetStore(ObjectStore):
439 """A persistent storage mechanism for result sets."""
440 - def create_resultSet(self, session, rset=None):
441 raise(NotImplementedError)
442 - def delete_resultSet(self, session, req):
443 raise(NotImplementedError)
444 - def fetch_resultSet(self, session, req):
445 raise(NotImplementedError)
446 - def fetch_resultSetList(self, session, req):
447 raise(NotImplementedError)
448 - def store_resultSet(self, session, req):
449 raise(NotImplementedError)
450 451 452 # --- Code Instantiated Objects --- 453
454 -class Document:
455 """A Document is the raw data which will become a record. It may 456 be processed into a Record by a Parser, or into another Document 457 type by a PreParser. Documents might be stored in a DocumentStore, 458 if necessary, but can generally be discarded. Documents may be 459 anything from a JPG file, to an unparsed XML file, to a string 460 containing a URL. This allows for future compatability with new 461 formats, as they may be incorporated into the system by 462 implementing a Document type and a PreParser.""" 463 id = -1 464 documentStore = "" 465 text = "" 466 mimeType = "" 467 processHistory = [] 468 parent = ('','',-1) 469
470 - def __init__(self, data, creator="", history=[], mimeType="", parent=None):
471 """The constructer takes the data which should be used to 472 construct the document. This is implementation dependant. It 473 also optionally may take a creator object, process history 474 information and a mimetype. The parent option is for 475 documents which have been extracted from another document, for 476 example pages from a book.""" 477 raise(NotImplementedError)
478 - def get_raw(self):
479 """Return the raw data associated with this document.""" 480 raise(NotImplementedError)
481 482
483 -class ResultSet:
484 """A collection of records, typically created as the result of a search on a database.""" 485 termid = -1 486 totalOccs = 0 487 totalRecs = 0 488 id = "" 489 expires = "" 490 index = None 491 queryTerm = "" 492 queryFreq = 0 493 queryFragment = None 494 queryPositions = [] 495 relevancy = 0 496 maxWeight = 0 497 minWeight = 0 498
499 - def combine(self, session, others, clause):
500 raise(NotImplementedError)
501 - def retrieve(self, session, req):
502 raise(NotImplementedError)
503 - def sort(self, session, req):
504 raise(NotImplementedError)
505
506 -class ResultSetItem(object):
507 """An object representing a pointer to a Record, with result set specific metadata.""" 508 docid = 0 509 recordStore = "" 510 occurences = 0
511 512
513 -class Record:
514 """Records in the system are stored in an XML form. Attached to 515 the record is various configurable metadata, such as the time it 516 was inserted into the database and by which user. Records are 517 stored in a RecordStore database and retrieved via a persistent 518 and unique document identifier. The record data may be retrieved 519 as a list of SAX events, as regularised XML or as a DOM tree.""" 520 schema = '' 521 schemaType = '' 522 status = '' 523 baseUri = '' 524 history = [] 525 rights = [] 526 recordStore = None 527 elementHash = {} 528 resultSetItem = None 529 530 parent = ('', None, 0) 531 processHistory = [] 532 533 dom = None 534 xml = "" 535 sax = [] 536
537 - def __init__(self, data, xml, docid=None):
538 raise(NotImplementedError)
539
540 - def get_dom(self):
541 """Return the DOM document node for the record.""" 542 raise(NotImplementedError)
543 - def get_sax(self):
544 """Return the list of SAX events for the record, serialised according to the internal C3 format.""" 545 raise(NotImplementedError)
546 - def get_xml(self):
547 """Return the XML for the record as a serialised string.""" 548 raise(NotImplementedError)
549
550 - def process_xpath(self, xpath, maps={}):
551 """Process the given xpath (either string or compiled), perhaps with some supplied namespace mappings.""" 552 raise(NotImplementedError)
553
554 -class Session:
555 """An object to be passed around amongst the processing objects to 556 maintain a session. It stores, for example, the current 557 environment, user and identifier for the database.""" 558 user = None 559 task = "" 560 database = "" 561 environment = ""
562
563 -class Workflow(C3Object):
564 """A workflow is similar to the process chain concept of an index, 565 but behaves at a more global level. It will allow the 566 configuration of a workflow using Cheshire3 objects and simple 567 code to be defined and executed for input objects. For example, 568 one might define a common workflow pattern of PreParsers, a Parser 569 and then indexing routines in the XML configuration, and then run 570 each document in a documentFactory through it. This allows users 571 who are not familiar with Python, but who are familiar with XML 572 and the Cheshire3 design to implement tasks as required by 573 changing only configuration files. It thus also allows a user to 574 configure personal workflows in a Cheshire3 system which they 575 don't have permission to modify.""" 576 code = "" 577
578 - def process(self, session, *args, **kw):
579 """Executes the code as constructed from the XML configuration 580 on the given object. The return value is the last object to be 581 produced by the execution. This function is automatically 582 written and compiled when the object is instantiated.""" 583 raise(NotImplementedError)
584
585 -class Logger(C3Object):
586 - def log(self, session, *args, **kw):
587 raise(NotImplementedError)
588