Prev: Classification Next: Selected Classification Methods, Up: Classification
There are two types of objects that will be introduced in this
lesson: learners and classifiers. Orange has a number of build-in
learners. For instance, orange.BayesLearner
is a naive Bayesian
learner. When data is passed to a learner (e.g.,
orange.BayesLearner(data))
, it returns a classifier. When data
instance is presented to a classifier, it returns a class, vector
of class probabilities, or both.
Let us see how this works in practice. For a start, we will construct a naive Bayesian classifier from voting data set, and will use it to classify the first five instances from this data set (don't worry about overfitting for now).
classifier.py (uses voting.tab)
The script loads the data, uses it to constructs a classifier using naive Bayesian method, and then classifies first five instances of the data set. Note that both original class and the class assigned by a classifier is printed out.
The data set that we use includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes; a class is a representative's party. There are 435 data instances - 267 democrats and 168 republicans - in the data set (see UCI ML Repository and voting-records data set for further description). This is how our classifier performs on the first five instances:
You can see that naive Bayes makes a mistake at a third instance, but otherwise predicts correctly.
To find out what is the probability that the classifier assigns
to, say, democrat class, we need to call the classifier with
additional parameter orange.GetProbabilities
. Also, note that the
democrats have a class index 1 (we find this out with print
data.domain.classVar.values
; notice that indices in Python start
with 0; also notice that we have indicated the order of classes in
the voting.tab already: instead of
writing discrete for the class variable, we listed its set of
possible values in the desired order).
classifier2.py (uses voting.tab)
The output of this script is:
The printout, for example, shows that with the third instance naive Bayes has not only misclassified, but the classifier missed quite substantially; it has assigned only a 0.005 probability to the correct class.
In Orange, most of the classifiers support the prediction of both class and/or probabilities, so what you have learned here on this topic is rather general. If you want to get a taste of some other Orange's classifiers, check the next lesson. Alternatively, you may go directly to see how the classifiers are tested and evaluated.
Prev: Classification Next: Selected Classification Methods, Up: Classification