1
2
3 from configParser import C3Object
4
6 """A Server object is a collection point for other objects, and an
7 initial entry into the system for requests from a
8 ProtocolHandler. A server might know about several Databases,
9 RecordStores and so forth, but its main function is to check
10 whether the request should be accepted or not and create an
11 environment in which the request can be processed. It will likely
12 have access to a UserStore database which maintains authentication
13 and authorisation information. The exact nature of this
14 information is not defined, allowing many possible backend
15 implementations. Servers are the top level of configuration for
16 the system and hence their constructor requires the path to a
17 local XML configuration file, however from then on configuration
18 information may be retrieved from other locations such as a remote
19 datastore to enable distributed environments to maintain
20 synchronicity."""
21
22 databases = {}
23 authStore = None
24 resultSetStore = None
25 queryStore = None
26
27 - def __init__(self, session, configFile="serverConfig.xml"):
28 """The constructer takes a Session object and a file path to
29 the configuration file to be parsed and processed.
30 """
31 raise(NotImplementedError)
32
33 - def connect(self, session, connection):
34 """Accept a connection to the server and perform any per session initialisation"""
35 raise(NotImplementedError)
36
38 """Called when a connection is dropped to perform any end of session tidying."""
39 raise(NotImplementedError)
40
42 "Handle authentication of session/request and establish user profile."
43 raise(NotImplementedError)
44
45
47 """A Database is a collection of Records and Indexes. It is
48 responsable for maintaining and allowing access to its components,
49 as well as metadata associated with the collections. It must be
50 able to interpret a request, splitting it amongst its known
51 resources and then recombine the values into a single response."""
52 indexes = {}
53 protocolMaps = {}
54 recordStore = None
55
57 """Ensure that a record is registered with the database. This
58 function does not ensure persistence of the record, not index
59 it, just perform registration"""
60 raise(NotImplementedError)
62 """Unregister the record."""
63 raise(NotImplementedError)
64
66 """Sends the record to all indexes registered with the database to be indexes"""
67 raise(NotImplementedError)
69 """ Reindex all records registered with the database"""
70 raise(NotImplementedError)
72 """Sends the record to all indexes registered with the database to be removed/unindexed"""
73 raise(NotImplementedError)
74
76 """Perform tasks before records are to be indexed"""
77 raise(NotImplementedError)
79 """Perform tasks after records have been sent to indexes. For
80 example, commit any temporary data to IndexStores"""
81 raise(NotImplementedError)
83 """Ensure persistence of database metadata"""
84 raise(NotImplementedError)
85
86 - def scan(self, session, query, numReq, direction=">="):
87 """Given a CQL query (single searchClause), resolve the index
88 and return a section of the term list"""
89 raise(NotImplementedError)
90 - def search(self, session, query):
91 """ Given a CQL query, perform the query and return a resultSet"""
92 raise(NotImplementedError)
93 - def sort(self, session, resultSets, sortKeys):
94 """ Take one or more resultSets and sort/merge by sortKeys """
95 raise(NotImplementedError)
96
98 """ Authenticate user against database """
99 raise(NotImplementedError)
101 """ Perform database specific session initialisation """
102 raise(NotImplementedError)
104 """ Perform any database specific session tidying on completion """
105 raise(NotImplementedError)
106
107
108
110 """An Index is an object which defines an access point into
111 records and is responsable for extracting that information from
112 them. It can then store the information extracted in an
113 IndexStore. The entry point can be defined using one or more XPath
114 expressions, and the extraction process can be defined using a
115 workflow chain of standard objects. These chains must start with
116 an Extractor, but from there might then include PreParsers,
117 Parsers, Transformers, Normalizers and even other Indexes. The
118 index can also be the last object in a regular Workflow, so long
119 as an XPath object is used to find the data in the record
120 immediately before an Extractor."""
121 indexStore = None
122
124 """ Perform tasks before indexing any records """
125 raise(NotImplementedError)
127 """ Perform tasks after records have been indexed """
128 raise(NotImplementedError)
130 """ Accept a record to index. If begin indexing has been
131 called, the index might not commit any data until
132 commit_indexing is called. If it is not in batch mode, then
133 index_record will also commit the terms to the indexStore."""
134 raise(NotImplementedError)
136 """ Delete all the terms of the given record from the indexes.
137 Does this by extracting the terms from the record, finding and
138 removing them. Hence the record must be the same as the one
139 that was indexed."""
140 raise(NotImplementedError)
141
142 - def scan(self, session, clause):
143 """Produce an ordered term list with document frequencies and total occurences """
144 raise(NotImplementedError)
145 - def search(self, session, clause):
146 """Search this particular index given a CQL clause, return a resultSet object"""
147 raise(NotImplementedError)
148 - def sort(self, session, resultSet):
149 """Sort a result set based on the values extracted according to this index."""
150 raise(NotImplementedError)
151
153 """Callback from IndexStore to serialise list of terms and
154 document references to be stored"""
155 raise(NotImplementedError)
157 """Callback from IndexStore to take serialised data and
158 produce list of terms and document references."""
159 raise(NotImplementedError)
160 - def merge_terms(self, session, structTerms, newTerms, op="replace", recs=0, occs=0):
161 """Callback from IndexStore to take two sets of terms and merge them together."""
162 raise(NotImplementedError)
164 """Take a single item, as stored in this Index, and produce a ResultSetItem representation."""
165 raise(NotImplementedError)
167 """Take a list of terms and produce an appropriate ResultSet object."""
168 raise(NotImplementedError)
169
170
171
173 """An XPathProcessor is a simple wrapper around an XPath. It is used
174 to evaluate the XPath expression according to a given record in
175 workflows"""
177 """Process the XPath for the given record and return the results."""
178 raise(NotImplementedError)
179
180
181
183 """An Extractor is a processing object called by an Index with the
184 value of an evaluated XPath expression or with a string. Example
185 normalizers might extract keywords from an element or the entire
186 contents thereof as a single string. Extractors must also be used
187 on the query terms to apply the same keyword processing rules, for
188 example."""
190 """Process a raw string, eg from an attribute value or the query."""
191 raise(NotImplementedError)
193 """Process a DOM node."""
194 raise(NotImplementedError)
196 """Process a list of SAX events serialised in C3 internal format."""
197 raise(NotImplementedError)
199 """Process the result of an XPath expression. Convenience
200 function to wrap the other process_* functions and do type
201 checking."""
202 raise(NotImplementedError)
203
204
206 """Normalizer objects are chained after Extractors in order to
207 transform the data from the record or query. A Normalizer might
208 standardise case, perform stemming or transform a date into
209 ISO8601 format. Normalizers are also needed to transform the terms
210 in a request into the same format as the term in the Index. For
211 example a date index might be searched using a free text date and
212 that would need to be parsed into the normalised form in order to
213 compare it with the stored data."""
214
216 """Process a string into an alternative form."""
217 raise(NotImplementedError)
218
220 """Process a hash of values into alternative forms."""
221 raise(NotImplementedError)
222
223
224
226 """ Object Docs """
227 - def load(self, session, data, cache=None, format=None, tag=None, documentType=None, codec=""):
228 raise(NotImplementedError)
230 raise(NotImplementedError)
231
233 pass
234
235
236
238 """Normally a simple wrapper around an XML parser, these objects
239 can be viewed as Record Factories. They take a Document containing
240 some XML and produce the equivalent Record."""
242 """Take a Document, parse it and return a Record object."""
243 raise(NotImplementedError)
244
245
247 """A PreParser takes a Document and creates a second one. For
248 example, the input document might consist solely of a URL. The
249 output would be a Document with the data that the PreParser has
250 fetched from that address. This functionality allows for work flow
251 chains to be strung together in many ways, and perhaps in ways
252 which the original implemention had not foreseen."""
254 """Take a Document, transform it and return a new Document object."""
255 raise(NotImplementedError)
256
257
258
270
271
272
273 -class User(C3Object):
274 """An object representing a user of the system to allow for
275 convenient access to properties such as username, password and
276 rights metadata."""
277
278 username = ''
279 password = ''
280 rights = []
281 email = ''
282 realName = ''
283 flags = []
284
285 - def hasFlag(self, session, flag, object=None):
286 """Check whether or not the user has the specified flag. This
287 flag may be set regarding a particular object, for example
288 write access to a particular store."""
289 raise(NotImplementedError)
290
291
293 """Protocol maps map from an incoming query type to internal indexes based on some specification."""
294 protocol = ""
296 """Given a query, resolve it and return the index object to be used."""
297 raise(NotImplementedError)
298
299
300
301
302
304 """A persistent storage mechanism for extracted terms."""
305
307 """Perform tasks as required before indexing begins, for example setting batch files."""
308 raise(NotImplementedError)
310 """Commit the data to disk from the indexing process."""
311 raise(NotImplementedError)
312
314 """Does the IndexStore currently store the given Index."""
315 raise(NotImplementedError)
317 """Create an index in the store."""
318 raise(NotImplementedError)
320 """Remove all the terms from an index, but keep the specification."""
321 raise(NotImplementedError)
323 """Completely delete an index from the store."""
324 raise(NotImplementedError)
325
327 """Fetch a list of all indexes stored in this IndexStore."""
328 raise(NotImplementedError)
330 """Fetch statistics (such as size) about the given Index."""
331 raise(NotImplementedError)
332
333 - def delete_terms(self, session, index, terms, record=None):
334 """Delete the given terms from an index, optionally only for a particular record."""
335 raise(NotImplementedError)
337 """Store terms in the index for a given record."""
338 raise(NotImplementedError)
339 - def fetch_term(self, session, index, term, summary=0, reverse=0):
340 """Fetch a single term."""
341 raise(NotImplementedError)
342 - def fetch_termList(self, session, index, term, numReq=0, relation="", end="", summary=0, reverse=0):
343 """Fetch a list of terms for an index.
344 numReq: How many terms are wanted.
345 relation: Which order to scan through the index.
346 end: A point to end at (eg between A and B)
347 summary: Only return frequency info, not the pointers to matching records.
348 reverse: Use the reversed index if available (eg 'xedni' not 'index').
349 """
350 raise(NotImplementedError)
352 """Fetch a stored data value for the given record."""
353 raise(NotImplementedError)
354
355
356
358 """An interface to a persistent storage mechanism for configured Cheshire3 objects."""
360 """Given a Cheshire3 object, create a serialised form of it in the database.
361 Note: You should use create_record() as per RecordStore to create an object from a configuration.
362 """
363 raise(NotImplementedError)
365 """Delete an object."""
366 raise(NotImplementedError)
368 """Fetch an object."""
369 raise(NotImplementedError)
371 """Store an object, potentially overwriting an existing copy."""
372 raise(NotImplementedError)
373
374
376 """An interface to persistent storage for Queries."""
378 """Create a new query in the store."""
379 raise(NotImplementedError)
381 """Delete a query from the store."""
382 raise(NotImplementedError)
384 """Fetch a query from the store."""
385 raise(NotImplementedError)
387 """Fetch a list of identifiers associated with queries in the store."""
388 raise(NotImplementedError)
390 """Store a query, potentially overwriting an existing copy."""
391 raise(NotImplementedError)
392
393
394
396 """A persistent storage mechanism for Records. It allows such
397 operations as create, update, fetch and delete. It also allows
398 fast retrieval of important record metadata, for use with
399 computing relevance ranking for example."""
400
402 """Create a record and assign it a new identifier."""
403 raise(NotImplementedError)
405 """Permissions checking hook for store_record if the record does not already exist."""
406 raise(NotImplementedError)
408 """Store an existing record. It must already have an identifier assigned to it."""
409 raise(NotImplementedError)
411 """Return the record with the given identifier."""
412 raise(NotImplementedError)
414 """Delete the record with the given identifier from storage."""
415 raise(NotImplementedError)
417 """Return the size of the record, according to its metadata."""
418 raise(NotImplementedError)
420 """Return a checksum for the record, if one is available."""
421 raise(NotImplementedError)
422
423
425 """An interface to a persistent storage mechanism for Documents and their associated metadata."""
427 raise(NotImplementedError)
429 raise(NotImplementedError)
431 raise(NotImplementedError)
433 raise(NotImplementedError)
435 raise(NotImplementedError)
436
437
439 """A persistent storage mechanism for result sets."""
441 raise(NotImplementedError)
443 raise(NotImplementedError)
445 raise(NotImplementedError)
447 raise(NotImplementedError)
449 raise(NotImplementedError)
450
451
452
453
455 """A Document is the raw data which will become a record. It may
456 be processed into a Record by a Parser, or into another Document
457 type by a PreParser. Documents might be stored in a DocumentStore,
458 if necessary, but can generally be discarded. Documents may be
459 anything from a JPG file, to an unparsed XML file, to a string
460 containing a URL. This allows for future compatability with new
461 formats, as they may be incorporated into the system by
462 implementing a Document type and a PreParser."""
463 id = -1
464 documentStore = ""
465 text = ""
466 mimeType = ""
467 processHistory = []
468 parent = ('','',-1)
469
470 - def __init__(self, data, creator="", history=[], mimeType="", parent=None):
471 """The constructer takes the data which should be used to
472 construct the document. This is implementation dependant. It
473 also optionally may take a creator object, process history
474 information and a mimetype. The parent option is for
475 documents which have been extracted from another document, for
476 example pages from a book."""
477 raise(NotImplementedError)
479 """Return the raw data associated with this document."""
480 raise(NotImplementedError)
481
482
505
507 """An object representing a pointer to a Record, with result set specific metadata."""
508 docid = 0
509 recordStore = ""
510 occurences = 0
511
512
514 """Records in the system are stored in an XML form. Attached to
515 the record is various configurable metadata, such as the time it
516 was inserted into the database and by which user. Records are
517 stored in a RecordStore database and retrieved via a persistent
518 and unique document identifier. The record data may be retrieved
519 as a list of SAX events, as regularised XML or as a DOM tree."""
520 schema = ''
521 schemaType = ''
522 status = ''
523 baseUri = ''
524 history = []
525 rights = []
526 recordStore = None
527 elementHash = {}
528 resultSetItem = None
529
530 parent = ('', None, 0)
531 processHistory = []
532
533 dom = None
534 xml = ""
535 sax = []
536
537 - def __init__(self, data, xml, docid=None):
538 raise(NotImplementedError)
539
541 """Return the DOM document node for the record."""
542 raise(NotImplementedError)
544 """Return the list of SAX events for the record, serialised according to the internal C3 format."""
545 raise(NotImplementedError)
547 """Return the XML for the record as a serialised string."""
548 raise(NotImplementedError)
549
551 """Process the given xpath (either string or compiled), perhaps with some supplied namespace mappings."""
552 raise(NotImplementedError)
553
555 """An object to be passed around amongst the processing objects to
556 maintain a session. It stores, for example, the current
557 environment, user and identifier for the database."""
558 user = None
559 task = ""
560 database = ""
561 environment = ""
562
564 """A workflow is similar to the process chain concept of an index,
565 but behaves at a more global level. It will allow the
566 configuration of a workflow using Cheshire3 objects and simple
567 code to be defined and executed for input objects. For example,
568 one might define a common workflow pattern of PreParsers, a Parser
569 and then indexing routines in the XML configuration, and then run
570 each document in a documentFactory through it. This allows users
571 who are not familiar with Python, but who are familiar with XML
572 and the Cheshire3 design to implement tasks as required by
573 changing only configuration files. It thus also allows a user to
574 configure personal workflows in a Cheshire3 system which they
575 don't have permission to modify."""
576 code = ""
577
578 - def process(self, session, *args, **kw):
579 """Executes the code as constructed from the XML configuration
580 on the given object. The return value is the last object to be
581 produced by the execution. This function is automatically
582 written and compiled when the object is instantiated."""
583 raise(NotImplementedError)
584
586 - def log(self, session, *args, **kw):
587 raise(NotImplementedError)
588