![]() |
![]() |
Cheshire3: preParser Module |
|
|||
NormalizerPreParser | Calls a named Normalizer to do the conversion | ||
HtmlSmashPreParser | Attempts to reduce HTML to its raw text | ||
RegexpSmashPreParser | Either strip, replace or keep data which matches a given regular expression | ||
HtmlTidyPreParser | Calls Tidy utility to turn HTML into XHTML for parsing | ||
TagStripPreParser | Strip only named tags from the document eg script, style | ||
PdfToXmlPreParser | pdftohtml wrapper to turn PDF into XML | ||
PdfToTxtPreParser | Convert PDF to text via pdftotext utility | ||
SgmlPreParser | Convert SGML into XML | ||
AmpPreParser | Escape lone ampersands in otherwise XML text | ||
MarcToXmlPreParser | Convert MARC into MARCXML | ||
MarcToSgmlPreParser | Convert MARC into Cheshire2's MarcSgml | ||
TxtToXmlPreParser | Minimally wrap text in <data> xml tags | ||
GzipPreParser | Gunzip a gzipped document | ||
BzipPreParser | |||
B64EncodePreParser | Encode document in Base64 | ||
B64DecodePreParser | Decode document from Base64 | ||
UrlPreParser | |||
OpenOfficePreParser | Use OpenOffice server to convert documents into OpenDocument XML | ||
PrintableOnlyPreParser | Replace or Strip non printable characters | ||
CharacterEntityPreParser | Transform latin-1 and broken character entities into numeric character entities. |
Generated by Epydoc 3.0alpha2 on Wed Aug 9 18:09:56 2006 | http://epydoc.sf.net |