Prev: Feature subset selection, Next: Basic data manipulation, Up: Other Techniques for Orange Scripting
Building ensemble classifiers in Orange is simple and easy. Starting from learners/classifiers that can predict probabilities and, if needed, use example weights, ensembles are actually wrappers that can aggregate predictions from a list of constructed classifiers. These wrappers behave exactly like other Orange learners/classifiers. We will here first show how to use a module for bagging and boosting that is included in Orange distribution (orngEnsemble module), and then, for a somehow more advanced example build our own ensemble learner.
First, there is a module for Bagging and Boosting, and using it is very easy: you have to define a learner, give it to bagger or booster, which in turn returns a new (boosted or bagged) learner. Here goes an example:
ensemble3.py (uses promoters.tab)
Most of the code is used for defining and naming objects that
learn, and the last piece of code is to report evaluation
results. Notice that to bag or boost a learner, it takes only a single
line of code (like, bagged_knn = orngEnsemble.BaggedLearner(knn,
t=10)
)! Parameter t
in bagging and boosting refers
to number of classifiers that will be used for voting (or, if you
like better, number of iterations by boosting/bagging). Depending on
your random generator, you may get something like:
If you have sequentially followed through this tutorial, building your own ensemble learner is nothing new: you have already build a module for bagging. Here is another similar example: we will build a learner that takes a list of learners, obtains classifiers by training them on the example set, and when classifying, uses classifiers to estimate probabilities and goes with the class that is predicted the highest probability. That is, at the end, the prediction of a sole classifier counts. If class probabilities are requested, they are reported as computed by this very classifier.
Here is the code that implements our learner and classifier:
part of ensemble2.py
WinnerLearner_Class
store the learners and, when
called with a data set, constructs the classifiers and passes them to
WinnerClassifier
. When this is called with a data
instance, it computes class probabilities, finds a maximum probability
for some class, and accordingly reports on either the class,
probabilities or both. Notice that we have also took care that this
learner/classifier confirms to everything that is expected from such
objects in Orange, so it may be used by other modules, most
importantly for classifier validation.
An example of how our new learner is used is exemplified by the following script:
part of ensemble2.py (uses promoters.tab)
Notice again that invoking our new objects and using them for machine learning is just as easy as using any other learner. When running this script, it may report something like:
The example script above that implements WinnerLearner and WinnerClassifiers may easily be adapted to something that you need for ensemble learner. For an exercise, change the learner such that it uses cross-validation to estimate the probability that the classifier will predict correctly, and when classifying use this probability to weight the classifiers and change winner-schema to voting.
Prev: Feature subset selection, Next: Basic data manipulation, Up: Other Techniques for Orange Scripting