Orange Peculiarities

Orange has several features that will seem peculiar to an experienced Python programmer. Orange has been conceived as a machine library system that, "incidentally", has an interface to Python, and is tailored to suit the machine learning community. The other reason for differences with typical Python modules is that we have not always followed novelties in Python, but we have added several extensions of our own.


Schizophrenic Constructors

In ordinary Python classes, constructors of class A construct instances of class A. It is hard to imagine otherwise. One of Python's PEPs, however, briefly mentions that it is possible for a built-in class to return an instance of another type. Orange classes are implemented as built-in classes and exploit this possibility to offer a shortcut for construction of certain objects.

The typical example for this are learners and classifiers. Basically, learners are objects that can be called with learning examples as an argument, and return a classifier. Classifiers, in turn, are object that can be called with an example and predict its class (or probabilities for the classes). For instance, if data contains learning examples, we can construct a Bayesian learner and classifier like this:

learner = orange.BayesLearner() classifier = learner(data)

Everything is normal here. orange.BayesLearner() returns an instance of orange.BayesLearner. Many users would, however, prefer to do to it like this:

classifier = orange.BayesLearner()(data)

The learner is constructed and immediately called. In order to avoid extra parentheses, orange.BayesLearner's constructor can accept data, call learning and return a classifier:

classifier = orange.BayesLearner(data)

The same can be done with most classes whose main purpose is construction of another class through the call operator (in the same way as orange.BayesLearner's main function is construction of orange.BayesClassifier). If the constructor is passed the arguments that would be passed to the call, the constructor will execute the call itself and returned the corresponding instance.

Is there a way to program similar constructors in Python? Yes, to some extent. The trick is to have a function instead of a class. Here's a skeleton of the Python code that would simulate the BayesLearner's behaviour.

def BayesLearner(data = None, weightId = 0): learner = BayesLearnerClass() if data: return learner(data, weightId) else: return learner class BayesLearnerClass: def __init__(self): # here comes initialization def __call__(self, data, weightId=0): # here comes learning, in the end, you can # return an instance of BayesClassifierClass class BayesClassifierClass: # ... and so on

The difference between this and what Orange offers is that type(BayesLearner) is <type 'function'>, while for Orange classes type(orange.BayesLearner) is not a function but a class type.

Passing Attributes to Constructors

Another peculiarity of constructors is that they accept values of attributes to be set. Essentially, everything that is passed as a keyword argument is copied to the instances dictionary, just as if constructors were defined like this.

class MyClass: def __init__(self, <positional arguments>, **args): self.__dict__.update(args) <...>

This will significantly reduce the length of your scripts. Instead of

treelearner = orange.TreeLearner() treelearner.storeExamples = 1 tree = treelearner(data)

you can - using this trick and the one from the previous section, simply say

tree = orange.TreeLearner(data, storeExamples = 1)

The constructor will construct an orange.TreeLearner, set its attribute storeExamples, call it with the data and return the resulting tree classifier.

Important note: Previously, Orange also accepted key arguments to calls, which were stored as attribute values by call operators. This was removed for several reasons. Most importantly, statements like filtered_data = filter(data, negate=1) were sometimes understood as if negate was set to 1 only for the duration of the call, while in fact it was stored and remained in effect for the subsequent calls. The other reason is that Python now uses this syntax for naming positional arguments (this wasn't so when we began developing Orange).

Passing attribute values in calls was seldom used, so we felt we can eliminate this functionality without major inconvenience. Passing attribute values now raises an exception, telling that the call doesn't support keyword arguments; if you happen to see this in your code, please move setting the attribute above the call.

There are a two cases where this functionality was handy: setting 'negate' in filters and specifying probabilities or number of folds in descendants of MakeRandomIndices. In these two places we therefore retained it, but setting the attribute is now effective only for the duration of the call. This is the only normal expected behaviour; there may be some who would have problems because of this incompatibility (we hope to avoid it by properly advertising it), but we believe that many more would have problems if we kept the counterintuitive version that we had before. Passing attribute values to constructors is not ambiguous and is very handy, so it will remain supported.

Type Checking for Built-in Attributes

Instances of Orange's classes have two kinds of attributes. Orange's classes are written in C++, and most of their fields are exported to Python where they are seen as instances' attributes. These attributes are "built-in" and are stored as native C++ objects (as opposed to the attributes added by user, which are a matter of the following section). Thus, orange.Filter_index's attribute value, which poses as an integer in Python, is stored as an ordinary C++ int, and orange.Variable's name is internally an STL's string.

Python has no type-checking. If a Python class expects certain attribute to be of certain type, this is not checked when the attribute value is set. Potential errors are discovered only when the attribute is used. Not so in Orange. Since built-in attributes are stored as C++ structures, they need to be converted while set (or read). Passing a value of wrong type is discovered immediately. If data is an orange.ExampleTable, setting its domain to a wrong type will yield an error.

>>> data.domain=2 Traceback (most recent call last): File "<interactive input>", line 1, in ? TypeError: invalid parameter type for 'ExampleTable.domain', (expected 'Domain', got 'int')

Warnings While Setting Attributes

Python classes do not have a fixed set of attributes. You can create new attributes by assignment.

>>> class A: ... pass ... >>> a = A() >>> a.x = 12

The ability to add attributes as needed is a really neat feature of Python, which enables extensibility and reusability unparalleled by most (if not all) other object-oriented languages. Orange supports this as well and it can be put to good use, for instance, in classification trees, where a testing procedure can use additional attributes to count the number of misclassifications for each node.

As described in the previous section, Orange's classes are written in C++, and a rather complex machinery is needed to support adding new attributes. Allowing this, however, can cause nasty bugs in Python scripts for Orange. Orange's classes have many attributes with long names and it is quite easy to misspell them. For instance, if you want a Bayesian classifier to use m-estimate for probability, you need to set its unconditionalEstimatorConstructor. Misspell it and you will create a new attribute with your, misspelled variation of the original name. The learner will not see it and will use the default value instead.

Due to many such errors and relatively little need for the user-defined attributes, Orange will trigger a warnings when such attributes are created.

>>> b=orange.BayesLearner() >>> b.g=12 __main__:1: AttributeWarning: 'g' is not a builtin attribute of 'BayesLearner'

This, on the other hand, can be annoying when setting a new attribute is not a mistake. There are two ways to avoid the unwanted warnings. One is to utilize standard Python's warning filters.

warnings.filterwarnings("ignore", "", orange.AttributeWarning)

This is rather dangerous practice; it is much safer to disable warnings that occur in particular module and/or with a particular orange class or module. The following disables warnings in orng modules.

warnings.filterwarnings("ignore", "", orange.AttributeWarning, "orng.*")

The complete documentation on module warning is included in Python's distribution.

The best way to use new attributes but without disabling the otherwise beneficial warnings is not to set them the usual way but through a function setattr.

>>> b.setattr("i", 15)

This has the exact same effect as b.i = 15, only without warnings.

Orange Lists

Orange has many objects that behave like lists, but aren't ordinary lists. The most widely seen objects of this kind are orange.ExampleTable, orange.Example and orange.Domain. There is nothing really special about them, but we have seen that many users tend to have problems with them. At certain points, they are reluctant to do typical list-like operations of them. For example, to iterate through the attributes of the data.domain, many write

for i in range(len(data.domain)): attr = data.domain[i] ...

while it is obviously more elegant to use data.domain as any other list

for attr in data.domain: ...

Similarly, to print out the first ten examples of an orange.ExampleTable, you can use slicing.

for example in data[:10]: print example

On the other hand, these are not ordinary Python lists and do not necessarily support all operations supported by Python lists. Sorting, for instance, is typically unsupported as it makes no sense or would violate object's consistency.

There are two efficient ways to convert an Orange list to an ordinary Python list. One is to instruct Python to create a list from your object:

listofexamples = list(data)

This is equivalent to

listofexamples = [example for example in data]

or (if you are not familiar with list comprehensions)

listofexmaples = [] for example in data: listofexamples.append(example)

The other is to use the method native. This method is not offered by all list-like object (but is also implemented for some non list-like objects). Method native can have an argument specifying the level of conversion. The orange.ExampleTable's method native can return a list of instances of orange.Example (as list, described above). If so requested, examples themselves can be converted to lists of values. Values, in turn, can be orange.Value or converted to Python integers and strings. Arguments to method native are specific upon the class.

A Few Things You Might Miss

Pickle and Deep-copy

Orange classes cannot be pickled (yet). This also prevents deep-copying, which works through pickling. Shallow copying can sometimes be achieved by calling constructor with the object that is to be copied as an argument. Deep-copying is, by our experience and by experience of other users, seldom needed and can be circumvented. Pickling, in contrast, would be very useful.

Named Arguments

Orange does not support (yet) named arguments to functions. In Python, function arguments can be given by position or by keywords.

>>> def f(a, b): ... return a-b ... >>> f(7, 5) 2 >>> f(b=5, a=7) 2

Orange functions do not accept the second way of passing arguments. The main reason for this is that we started to write interface between Orange and Python when keyword passing of arguments was not supported in Python yet. Supporting it now would require a lot of effort on our side. Since Orange's function generally have only a few arguments, keyword-style calling is not really needed, so solving this problem is rather at the bottom of the priority list.