Classifier from Attribute

Classifiers from attribute predict the class based on the value of a single attribute. While they can be used for making predictions, they actually play a different, yet important role in Orange. They are used not to predict class values but to compute attribute's values. For instance, when a continuous attribute is discretized and replaced by a discrete attribute, an instance of a classifier from attribute takes care of automatic value computation when needed. Similarly, a classifier from attribute usually decides the branch when example is classified in decision trees.

There are two classifiers from attribute; the simpler ClassifierFromVarFD supposes that example is from some fixed domain and the safer ClassifierFromVar does not. You should primarily use the latter, moreover since it uses a caching schema which helps the class to be practically as fast as the former.

Both classifiers can be given a transformer that can modify the value. In discretization, for instance, the transformer is responsible to compute a discrete interval for a continuous value of the original attribute.

ClassifierFromVar

Attributes

whichVar
The descriptor of the attribute whose value is to be returned.
transformer
The transformer for the value. It should be a class derived from TransformValue, but you can also use a callback function.
distributionForUnknown
The distribution that is returned when the whichVar's value is undefined.

When given an example, ClassifierFromVar will return transformer(example[whichVar]). whichVar can be either an ordinary attribute, a meta attribute or an attribute which is not defined for the example but has getValueFrom that can be used to compute the value. If none goes through or if the value found is unknown, a Value of subtype Distribution containing distributionForUnknown is returned.

The class stores the domain version for the last example and its position in the domain. If consecutive examples come from the same domain (which is usually the case), ClassifierFromVar is just two simple ifs slower than ClassifierFromVarFD.

As you might have guessed, the crucial component here is the transformer. Let us, for sake of demonstration, load a Monk 1 dataset and construct an attribute e1 that will have value "1", when e is "1", and "not 1" when e is different than 1. There are many ways to do it, and that same problem is covered in different places in Orange documentation. Although the way presented here is not the simplest, it will serve to demonstrate how ClassifierFromVar works.

part of part of classifierFromVar.py (uses monk1.tab)

import orange data = orange.ExampleTable("monk1") e = data.domain["e"] e1 = orange.EnumVariable("e1", values = ["1", "not 1"]) def eTransformer(value): if int(value) == 0: return 0 else: return 1 e1.getValueFrom = orange.ClassifierFromVar() e1.getValueFrom.whichVar = e e1.getValueFrom.transformer = eTransformer) data2 = data.select(["a", "b", "e", e1, "y"]) for i in data2: print i

As first, you might have noticed that transformer, an attribute of a pure C++ object ClassifierFromVar, has been assigned a Python function. As you can learn by reading the documentation on callback functions, the function itself gets automatically wrapped into a C++ class that performs the argument conversion to Python and back. (Not that you need to know about it. Just use it and be happy that it works.)

The problem here is that the eTransformer doesn't get the nice instances of orange.Value that you are used to. You cannot compare the value to a string - the function cannot begin by "if value == "1"", since the value has no associated attribute descriptor that would "understand" the string "1". Instead, you need to use integer indices. Since values of e are "1", "2", "3", "4", index 0 corresponds to value "1". The same goes for returning values; values of e1 are "1" and "not 1", in this order, so returning 0 says "1" and returning 1 says "not 1".

Having written the transformer, the rest is trivial - we assign a ClassifierFromVar to the new attribute's getValueFrom, and set its whichVar to e and transformer to eTransformer.

To check the results, we constructed a new example table containing only attributes a, b and e, the new attribute e1 and the class attribute. For example conversion, the value of e1 is computed by calling ClassifierFromVar and the overall effect is that for each example ex, e1 has value eTransformer(ex[e]).

ClassifierFromVarFD

ClassifierFromVarFD is very similar to ClassifierFromVar except that the attribute is not given as a descriptor (like whichVar) but as an index. The index can be either a position of the attribute in the domain or a meta-id. Given that ClassifierFromVarFD is practically no faster than ClassifierFromVar (and can in future even be merged with the latter), you should seldom need to use the class.

Attributes

domain (inherited from ClassifierFD)
The domain on which the classifier operates.
position
The position of the attribute in the domain or its meta-id.
transformer
The transformer for the value.
distributionForUnknown
The distribution that is returned when the whichVar's value is undefined.

When an example is passed to ClassifierFromVarFD, it is first checked whether it is from the correct domain; an exception is raised if not. If the domain is OK, the corresponding attribute value is retrieved, transformed and returned.

ClassifierFromVarFD's twin brother, ClassifierFromVar, can also handle attributes that are not in the examples' domain or meta-attributes, but can be computed therefrom by using their getValueFrom. Since ClassifierFromVarFD doesn't store attribute descriptor but only an index, such functionality is obviously impossible.

To rewrite the above script to use ClassifierFromVarFD, we need to set the domain and the e's index to position (equivalent to setting whichVar in ClassifierFromVar). The initialization of ClassifierFromVarFD thus goes like this:

part of part of classifierFromVar.py (uses monk1.tab)

e1.getValueFrom = orange.ClassifierFromVarFD() e1.getValueFrom.domain = data.domain e1.getValueFrom.position = data.domain.attributes.index(e) e1.getValueFrom.transformer = eTransformer