User defined output formats

Orange provides a mechanism for specifying new output formats and to even modify the way that objects are printed out by default.

All classes derived from Orange define the following two output methods.

Methods

dump(format)
Returns the string that represents the object in the given format. For instance, to print a classifier in XML format, use print classifier.dump("xml")
write(format, file)
Writes the object to a file; file must be an opened file.

Which are the supported formats? Well ... none - by default. And that's what's great about this: you may specify new output formats, write the corresponding functions and register them. Or let others do it for you. For instance, one can program a module containing functions for representing various classes in XML format and register them correspondingly.

There are two functions that deal with output registration.

Functions

orange.setoutput(class, format, function)
Sets the function for printing the instances of class in the given format. class can be any Orange class; defining output for some format naturally defines it for the subclasses as well (unless it is overloaded). function must accept a single argument - an instance of class and return a string with the instance's representation in the given format. format can be any string; use format names such as "ps", "xml", "ascii"...
orange.removeoutput(class, format)
Removes the output format for the class.

Output can only be redefined for classes derived from class Orange; the two classes that are not are Value and Example.

For example, here's a simple function for printing a contingency matrix in a tab-delimited format.

part of output.py

import orange def printTabDelimContingency(c): if c.innerVariable.varType != orange.VarTypes.Discrete or \ c.outerVariable.varType != orange.VarTypes.Discrete: raise "printTabDelimContingency can only handle discrete contingencies" res = "" for v in c.innerVariable.values: res += "\t%s" % v res += "\n" for i in range(len(c.outerVariable.values)): res += c.outerVariable.values[i] for v in c[i]: res += "\t%5.3f" % v res += "\n" return res

To register the function with format "tab", we call setoutput.

orange.setoutput(orange.Contingency, "tab", printTabDelimContingency)

Let us now read some data, construct a contingency matrix and print it out.

part of output.py (uses monk1.tab)

data = orange.ExampleTable("monk1") cont = orange.ContingencyAttrClass("e", data) print cont.dump("tab")

Why is this useful? First, by using standard format names and supporting the format for various classes, you can arrange for a more systematic output. For instance, you can define some XML schema for various classifiers (or use a standard one, such as PMML), program the corresponding functions and set the outputs. Afterwards, you can call classifier.write("xml", file) for any type of classifier (as long as it supports xml output. You don't need to care about the type of the classifier.

The other reason is that this mechanism allows you to redefine the standard output. The formats for the standard output are "repr" (used by function repr() and by reverse quotes) and "str" (used by str() and the print statement). When objects are printed in interactive session, the former format is used. The format names are based on Python standard methods for converting values into strings). If you find these messy, always redefine both formats.

So, to make print call our output function, we set the output by

part of output.py (uses monk1.tab)

orange.setoutput(orange.Contingency, "str", printTabDelimContingency) print cont