Module orngCI implements classes and function that support constructive induction (CI). It includes minimal-complexity function decomposition and a much generalized minimal-error function decomposition. Both can be used as feature induction algorithms or as a standalone learning algorithm HINT (Zupan, 1997; Zupan et al, 1998). In addition, the module provides an interface to generalized Kramer's method (Kramer, 1994) and serveral other CI related algorithms.
Constructive induction takes a subset of attributes, which we call bound attributes and creates a new attribute whose values are computed from values of the bound attributes. Orange supports the so-called data-driven constructive induction, where learning examples are used to decide how the new attribute is computed from the bound ones.
Feature constructors (orange objects that construct features) are thus given examples, a list of bound attributes, and, optionally, an id of the weight attribute. They construct a new attribute and return a tuple with the new feature and a value that estimates its quality.
Complex feature constructors, such as those for minimal-error
decomposition and for Kramer's algorithm show the flexibility of
Orange's component based approach. On the other hand, the hierarchy of
components that need to be put together to construct a workable
feature inducer is too complex even for an experienced user. The main
mission of orngCI
is thus to help the user by
"flattening" the hierarchy. Defaults are provided for all the
components so the user only needs to give the specific parts of the
hierarchy.
For instance, let fim
be an instance of
orange.FeatureIM
, a feature constructor for minimal error
function decomposition. Its most crucial parameter, m, is
"hidden" in
fim.clustersFromIM.columnAssessor.m
. Parameter m
is an attribute of an m-estimate based column assessor component of
the column quality assessor based algorithm for column clustering. To
save the user from having to remember this, orngCI
offers
its own class
. Now, if fim
is an
instance of orngCI.FeatureByIM
the user needs to set
fim.m
. Before the actual feature construction takes
place, orngCI.FeatureByIM
constructs an appropriate
orange.FeatureByIM
and sets its
clustersFromIM.columnAssessor.m
to
fim.m
. Constructing a structure and putting the
user-supplied components at the right places is called
deflattening.
A general principle is that if you define a component, you also
need to define its subcomponents as well. The default
columnAssessor
is
orange.ColumnAssessor_m
. If you leave it alone, all its
subcomponents are set as well. However, if you set
columnAssessor
to
orange.ColumnAssessor_Measure
, it is your responsibility
to set its attribute "measure", either by specifying it as a "measure"
argument to FeatureByIM
or by setting it directly in
columnAssessor.measure
.
The rest of this section is intended for advanced users.
The function that performs deflattening is always called createInstance
.
orange
classes that perform the feature construction. The instance is also stored in self
's field instance
.
Deflattening is initiated by the call operator. The call operator
is given examples, bound attributes and, optionally, the id of the
weight attribute, and returns a constructed attribute and an
assessment of its quality. To do its job, the call operator needs an
instance of a corresponding Orange feature constructor. Constructing
such instances at each invocation of call would be too costly,
therefore a kind of caching is implemented. When the call operator is
invoked for the first time, createInstance
is called to
deflatten the structure and the result is stored in
self.instance
. In further calls, the stored
self.instance
is used again. However, when any of classes
attributes that affect the feature construction (such as
fim.m
, see above example) are modified,
self.instance
is set to None
and a new
instance will be constructed at next call.
Now, if you want to, say, tune fim.m
, you can change
it and cause reinstantiations of corresponding orange
classes.
This will construct features from the first two attributes of the
domain, using different values of m. (It would be a blasphemy not to
publish a shorted way to write the above code fragment: the last four
lines can be replaced by attrs = [fim(data, bound)[0] for fim.m
in [0.1, 0.5, 1.0, 2.0]]
). However, at each call to
fim
, createInstances
is called and a new set
of components is constructed since setting m
resets
them. To make the program faster, you should replace
by
To construct a feature, you need to call a constructor. The call
operator expects two or three arguments: examples, bound-attributes
and, optionally an id of a weight meta attribute. It returns a tuple
containing the attribute and its quality estimation. This is the same
for all feature constructors in orngCI
, although some may
ignore some arguments or return useless quality assessments.
The method of quality estimation depends upon the algorithm used - it can be a decrease of error, an increase of impurity, ets. Qualities returned by different feature constructors are not comparable. The quality can be either negative or positive; in any case, higher numbers mean better attributes. If the measure of quality is negated (ie. low numbers meaning better attributes), they will have a negative sign.
The famous Monk 1 data set (Thrun et al, 1991) has the goal concept y := a=b or e=1. When joining attributes 'a' and 'b', the new attribute should express equality of a and b. Let us see if the minimal-complexity function decomposition can find the appropriate function.
Note that we specified the bound attributes by names. This is not the only way. If you want, you can give attribute descriptors, or even their indices in the domain. Thus, knowing that attributes "a" and "b" are the first two attributes, the above call could be replaced by
or even
or by any combination of names, descriptors and indices.
Now for the results. Values of the new attribute show to which values of the bound attributes they correspond. The new attribute is binary, its first value is named '1-1+2-2+3-3' and the other is '1-2+1-3+2-1+2-3+3-1+3-2'. It is obvious that this is the correct feature. Its quality is -2; minimal-complexity function decomposition prefers attributes with less values, so higher the number of values, more negative (and thus lower) the quality.
It makes sense to rename the values of the constructed feature. In our case, we shall call the first value "eq" and the second would be "diff".
If any of bound attributes have values that include '-' or '+', the constructed attribute's values will be named "c1", "c2", "c3"...
Constructed features can be added to the domain and used in further processing of the data, such as, for instance, learning. If that is what you want, you do not need to read this section any further: if you have followed the above example, you can conclude it by
This adds the attribute to the domain and returns a new example table.
The feature knows how to compute itself from the values of the attributes it is constructed from. The usual Orange mechanism is used for this: the new feature has a getValueFrom
pointing to a classifier that computes the feature's value. In short, when some method has an example and needs a value of a particular attribute, but this attribute is not in the example's domain, it can check whether the attribute has a getValueFrom
defined. If it has, the method calls it, passing the example as an argument. The getValueFrom
either computes the attribute's value or returns don't know.
For most of features constructed by methods in orngCI
, getValueFrom
will contain some variant of orange.ClassifierByLookupTable
. In above case, it is a classifier by lookup table based on two attributes. Values of the new attribute are read from its lookupTable
. Each entry in the table corresponds to a combination of value of bound attributes; the first corresponds to (1, 1), the second to (1, 2), then to (1, 3), (2, 1), (2,2)..., where the first digit is for variable1
and the second for variable2
. Value "eq" occurs at the first, the fifth and the ninth elements of the table, which correspond to (1, 1), (2, 2) and (3, 3). This is exactly what we expected for Monk 1 dataset.
If you need to manually correct the feature, you can change the lookupTable
:
This would work for most cases, but it would fail when you start to mess with probability distributions of the new feature. The complete description of classifiers by lookup table is, however, out of scope of this text.
Let us show an example of what getValueFrom
can do. We shall compute the class distributions for different values of the new attribute:
Works even though the new attribute, ab
is not in domain. Its values are computed on the fly, while counting, and are not stored anywhere. What the results say is that for the first value ("eq"), all examples are in the second class ("1"), while when values of a and b are different, 216 examples (exactly 3/4) are in the first class and the other 72 in the second. Similarly, you could assess the quality of the new attribute by using information gain, for instance. Note that the automatic computation won't always work. This is done as a precaution in cases when the algorithm would require many recomputations of the value on one and the same example. ReliefF, for instance, is one such measure. In such cases, you add the new attribute to a domain and compute a new dataset in an ExampleTable
, as shown above.
Feature inducers can be used to remove (or merge) redundant attribute values. In Monk 1, the attribute "e" has four values, but the goal concept only asks whether the value is "1" or not. Thus, values "2", "3" and "4" are equivalent. If minimal complexity decomposition is called to construct a new attribute from "e", it will easily remove the redundancies:
We can check the lookupTable
again.
orange.ColorIG_MCF
) is available, there is no point in changing this (more accurately, there is nothing to change it to).completion
is orngCI.FeatureByMinComplexity.CompletionByBayes
; the values are determined using Bayesian classifier. The alternatives are orngCI.FeatureByMinComplexity.CompletionByDefault
, where the most frequent value is used for imputation, and orngCI.FeatureByMinComplexity.NoCompletion
, where the values are left undefined.
Class
The algorithm builds an incompatibility matrix (also called partition matrix) and merges the columns until a stopping criterion is reached. In Zupan's method, columns are compared by how the merge would affect the average m-error estimate over elements of the matrix. Merging is stopped when the error stops decreasing. In Orange, different similarity measures and stopping criteria can be used.
Since the FeatureByIM
is analogous for FeatureByMinComplexity
, orngCI.FeatureByMinError
is provided as an alias for orngCI.FeatureByIM
. No class with this name appears in orange
, though.
orngCI.FeatureByIM
has a number of settable components. Their actual positions after deflattening are given in parentheses.
None
, orange.IMBySorting
will be used.ClustersFromIMByAssessor
which uses a columnAssessor
to evaluate distances between the columns and merges the most similar columns until the stopCriterion
says it's enough.orange.StopIMClusteringByAssessor_noProfit
, which stops the merging when the gain is negative or when it is smaller than the prescribed proportion. Alternatives are orange.StopIMClusteringByAssessor_binary
that stops merging when only two columns (future values) are left, and orange.StopIMClusteringByAssessor_n
which stops at n
columns.
minProfitProportion
is 0.n
is specified, the default stopCriterion
is orange.StopIMClusteringFromByAssessor_n
.orange.StopIMClusteringByAssessor_binary
is used as stopping criterion.
orange.ColumnAssessor_m
, which uses m-error estimation. Alternatives are
orange.ColumnAssessor_Laplace
minimizes Laplace estimation of error of matrix elements;orange.ColumnAssessor_Measure
which optimizes an attribute quality measure (such as information gain); this would be usually used in conjunction with orange.StopIMClusteringByAssessor_binary
or orange.StopIMClusteringByAssessor_n
;orange.ColumnAssessor_Kramer
which optimizes Kramer's impurity measure. It is defined for binary classes only. Since the profit is non-positive, the default stopping criteria is changed to orange.StopIMClusteringByAssessor_binary
. You may replace it by orange.StopIMClusteringByAssessor_n
.orange.ColumnAssessor_Relief
uses a measure similar to ReliefF. It has a local extreme, so it can be used with any stopping criterion.columnAssessor
is of type orange.ColumnAssessor_Measure
. If this option is given, the default columnAssessor
is orange.ColumnAssessor_Measure
.completion
is orngCI.FeatureByIM.CompletionByBayes
; the values are determined using Bayesian classifier. The alternatives are orngCI.FeatureByIM.CompletionByDefault
, where the most frequent value is used for imputation, and orngCI.FeatureByIM.NoCompletion
, where the values are left undefined.
You should pay attention to make sensible combinations of the arguments. If you, for instance, combine column assessor based on Kramer's impurity measure with the ordinary "no profit" stop criterion (by setting both explicitly), you will probably get features that will be simply Cartesian products of bound attributes, since Kramer's impurity measure is non-decreasing. orngCI
does not warning about nonsense settings.
As an example, let us check how minimal-error decomposition performs on Monk 1. We shall check its performance with different values of m, and for each we shall print out the values of the constructed feature.
Results can vary due to random decisions in the procedure. In general, lower m's give correct features. This is not surprising: through m the user may inform the algorithm about the level of noise in the data. With higher m's, the algorithm will tend to oversimplify the feature, in our case, it will merge more than needed.
Let us see whether Kramer's impurity measure can compete with this.
It cannot. But what if we allow it to build a four-valued attribute?
As a more interesting test, let us see what happens if we construct an attribute with optimization of information gain.
orange.MeasureAttribute_info()
estimates entropies in the elements of the incompatibility matrix and observes the changes in entropy with potential merging. This is not the standard information gain of an attribute but rather a form of non-myopic information gain measure.)
implements Kramer's algorithm for feature construction (Kramer, 1994).
Basically, it computes a distribution of classes for each combination of values of bound attributes and merges them, just like the minimal-error decomposition merges columns of incompatibility matrix - and hopefully. In case of Monk 1, it extract a 0:36 distribution for combinations 1-1, 2-2 and 3-3, and 36:12 for other values. It is straightforward for any clustering algorithm to see that the former three values belong to one and the latter three to another group.
This method is somewhat similar to minimal-error decomposition. If you sum all rows of incompability matrix to a single row, you get a list of distributions corresponding to combinations of value of bound attributes. And that's exactly the data that Kramer's algorithm operates on. The remaining differences between the algorithms are only in the choice of components; while minimal-error estimation uses a change of m-error estimate as a measure of distance between two values of the new attribute, Kramer's algorithm uses a special impurity measure (which can be, as Kramer himself states, replaced by any other impurity measure or even by some measure of similarity between two distributions).
Due to the similarity between the two methods, the structure of their components is similar. The data structure that corresponds to IM
(incompatilibity matrix) is called ExampleDist
. There is no component analogous to IMconstructor
.
ClustersFromDistributionsByAssessor
which uses a distributionAssessor
to evaluate differences between the distributions and merges the most similar values until the stopCriterion
says it's enough.orange.DistributionAssessor_Kramer
, which uses Kramer's measure of impurity. Alternatives are
orange.ColumnAssessor_Laplace
minimizes Laplace estimation of error of distributions;orange.DistributionAssessor_Measure
which optimizes an attribute quality measure (such as information gain);orange.ColumnAssessor_m
which assesses the quality of distributions using m-estimate for error.orange.DistributionAssessor_Relief
uses a measure similar to ReliefF. It has a local extreme, so it can be used with any stopping criterion.orngCI.FeatureByIM
class. This also influences the default stopCriterion
(see below).
orngCI.FeatureByIM
: when no n
, minProfitProportion
or binary
are given, the stopCriterion
depends upon columnAssessor
. When Kramer's assessor is used, default is orange.StopDistributionClustering_binary
, which stops when only two values are left. Otherwise, orange.StopDistributionClustering_noProfit
that stops when the gain is negative or when it is smaller than the prescribed proportion is used. The third alternative is orange.StopDistributionClustering_n
that stops at n
values.
minProfitProporion
is 0.n
is specified, the default stopCriterion
is orange.StopDistributionClusteringByAssessor_n
.orange.StopIMClusteringByAssessor_binary
is used as stopping criterion.
distributionAssessor
is of type orange.DistributionAssessor_Measure
. If this option is given, the default distributionAssessor
is orange.DistributionAssessor_Measure
.completion
is orngCI.FeatureByKramer.CompletionByBayes
; the values are determined using Bayesian classifier. The alternatives are orngCI.FeatureByKramer.CompletionByDefault
, where the most frequent value is used for imputation, and orngCI.FeatureByKramer.NoCompletion
, where the values are left undefined.
The class usage is essentially the same as that of orngCI.FeatureByIM
except for different defaults for distributionAssessor
and stopCriterion
. This invokes the default Kramer's algorithm that uses Kramer's measure of impurity for distribution quality assessment and induces binary concepts:
Setting m will replace the distribution estimator by orange.DistributionAssessor_m
and, consequentially, orange.StopDistributionClustering_noProfit
will be used as the stop criterion:
constructs features with the given number of values. Grouping of combinations of bound values is random and the estimated quality of the feature is a random value between 0 and 100, inclusive. The class is useful for comparisons - if a "real" method of constructive induction does not outperform this, it is of no use.
As an example, we shall induce 30-valued random features:
constructs an attribute which presents a Cartesian product of bound attributes. Its quality is estimated by a given attribute quality estimator of type orange.MeasureAttribute
. If none is given, all qualities are 0.
Let us construct a cartesian product-attribute and assess its quality using information gain:
orngCI.
is a class that constructs a list of features using the given feature construction algorithm. Besides, you can also specify a subset generator - an object that generates bound sets.
orange.SubsetsGenerator_constSize(2)
, which returns all pairs of attributes. Orange provides several alternatives, the most useful if orange.SubsetsGenerator_minMaxSize(min, max)
which generates all subsets within the specified minimal and maximal number of elements. You can define your own generators in Python, if you like.The core of the classes' code is really trivial. The class first checks whether the two fields are specified and provides the default for subsetGenerator
if needed. Then it executes:
We will user the class to induce new features from all attribute pairs in Monk 1 using Kramer's algorithm.
induces a hierarchy of attributes as proposed by Zupan et al (1998). At each step, it induces attributes from subset of the current attribute set. If any attributes are found, one of them is selected, inserted into domain and its bound attributes are removed. This is the repeated until there is only one attribute left or the selected attribute includes all attributes as bound set (i.e. if new attributes are induced from pairs of existing attributes, this would stop the induction when there are only two attributes left. The result of calling StructureInducer
is a function that computes the class attribute from the remaining attributes. Through using functions boundset()
and getValueFrom
it is possible to descend down through hierarchy.
FeatureByMinError
or FeatureByKramer
) or another Python function or class with the same behaviour.StructureInducer
constructs a FeatureGenerator
and initializes it with featureInducer
and subsetsGenerator
.orngCI.AttributeRedundanciesRemover
. If given, it is called prior to structure induction (but not during induction).keepDuplicates
to 1).This example shows how to induce a structure using Kramer's method for feature construction, where data
stores examples for dataset Monk 1.
Class
can be used as a Learner
.
Basically, HINT
is just a wrapper around StructureInducer
. The user can specify the type of decomposition (minimal complexity, minimal error) and the size of bound sets. Alternatively, HINT
can use defaults. When minimal error decomposition is used, HINT
fits parameter m for error estimation by performing a 5-fold cross validation to find the value that gives the maximal classification accuracy.
"complexity"
, "error"
, or "auto"
. This, basically, sets the StructureInducer
's featureInducer
to FeatureByMinError
or FeatureByMinComplexity
. When type
is "auto"
, HINT
tries to determine whether the domain contains noise or not by observing whether there are examples with the same values of attributes but different classes. If there are no such examples, minimal-complexity is used; if there are, it switches to minimal-error.
FeatureByMinError
is used if m
is given, and FeatureByMinComplexity
if it is not.This function adds an attribute to the domain. A new domain (and a new ExampleTable
is constructed; the original is left intact. Values for the new attribute are computed through its valueFrom
function.
Similar to addAttribute
, but also removes the bound attributes from the domain.
Prints out a hierarchy starting with a given attribute or a classifier. In the latter case, the classifier must support boundset
method; classifiers returned by InduceStructure
and HINT
do have this method.
You've seen an example of how to use this function a while ago.
Writes a hierarchy starting with a given attribute or a classifier to a file for tree drawing program Dot from package GraphViz. If the classifier is given, it must support boundset
method; classifiers returned by InduceStructure
and HINT
do have this method. File can be given as a string, in which case a new file is created, or as an opened file, in which case the tree is appended to the file.
S. Kramer. CN2-MCI (1994): A two-Step Method for Constructive Induction. Proc. ML-COLT '94 Workshop on Constructive Induction and Change of Representation. (http://www.hpl.hp.com/personal/Tom_Fawcett/CICR/kramer.ps.gz)
S. B. Thrun et al. (1991): A performance comparison of different learning algorithms. Carnegie Mellon University CMU-CS-91-197.
B. Zupan (1997). Machine learning based on function decomposition. PhD Thesis at University of Ljubljana.
B. Zupan, M. Bohanec, J. Demsar and I. Bratko (1998). Feature transformation by function decomposition. IEEE intell. syst. their appl., vol. 13, pp. 38-43