Prev: Build Your Own Learner, Next: Naive Bayes in Python, Up: Build Your Own Learner,
Let us build a learner/classifier that is an extension of build-in naive Bayes and which before learning categorizes the data (see also the lesson on Categorization). We will define a module nbdisc.py that will implement our method. As we have explained in the introductory text on learners/classifiers, it will implement two classes, Learner and Classifier. First, here is definition of a Learner class:
function Learner
from nbdisc.py
Learner_Class has three methods. Method __new__
creates the object and returns a learner or classifier, depending if
examples where passed to the call. If the examples were passed as an
argument than the method called the learner (invoking
__call__
method). Method __init__
is invoked
every time the class is called for the first time. Notice that all it
does is remembers the only argument that this class can be called
with, i.e. the argument name
which defaults to
‘discretized bayes’. If you would expect any other
arguments for your learners, you should handle them here (store them
as class’ attributes using the keyword self
).
If we have created an instance of the learner (and did not pass the
examples as attributes), the next call of this learner will invoke a
method __call__
, where the essence of our learner is
implemented. Notice also that we have included an attribute for vector
of instance weights, which is passed to naive Bayesian learner. In our
learner, we first discretize the data using Fayyad & Irani’s
entropy-based discretization, then build a naive Bayesian model and
finally pass it to a class Classifier
. You may expect
that at its first invocation the Classifier
will just
remember the model we have called it with:
class Classifier from nbdisc.py
The method __init__
in Classifier
is
rather general: it makes Classifier
remember all
arguments it was called with. They are then accessed through
Classifiers
’ arguments
(self.argument_name
). When Classifier is called, it
expects an example and an optional argument that specifies the type of
result to be returned.
This comples our code for naive Bayesian classifier with discretization. You can see that the code is fairly short (fewer than 20 lines), and it can be easily extended or changed if we want to do something else as well (include a feature subset selection, for instance, …).
Here are now a few lines to test our code:
>python
>>>import orange, nbdisc
>>>data = orange.ExampleTable("iris")
>>>classifier = nbdisc.Learner(data)
>>>print classifier(data[100])
Iris-virginica >>>classifier(data[100], orange.GetBoth)
(, <0.000, 0.001, 0.999>) >>>
For a more elaborate test that also shows the use of a learner (that is not given the data at its initialization), here is a script that does 10-fold cross validation:
nbdisc_test.py (uses iris.tab and nbdisc.py)
The accuracy on this data set is about 92%. You may try to obtain a better accuracy by using some other type of discretization, or try some other learner on this data (hint: k-NN should perform better).
You can now read on to see how the same schema of developing your own classifier was used for to assemble all-in-python naive Bayesian method and see how it is very easy to implement bagging.
Prev: Build Your Own Learner, Next: Naive Bayes in Python, Up: Build Your Own Learner,