Prev: Build Your Own Learner, Next: Naive Bayes in Python, Up: Build Your Own Learner,

Naive Bayes with Discretization

Let us build a learner/classifier that is an extension of build-in naive Bayes and which before learning categorizes the data (see also the lesson on Categorization). We will define a module nbdisc.py that will implement our method. As we have explained in the introductory text on learners/classifiers, it will implement two classes, Learner and Classifier. First, here is definition of a Learner class:

function Learner from nbdisc.py

class Learner(object): def __new__(cls, examples=None, name='discretized bayes', **kwds): learner = object.__new__(cls) if examples: learner.__init__(name) # force init return learner(examples) else: return learner # invokes the __init__ def __init__(self, name='discretized bayes'): self.name = name def __call__(self, data, weight=None): disc = orange.Preprocessor_discretize( \ data, method=orange.EntropyDiscretization()) model = orange.BayesLearner(disc, weight) return Classifier(classifier = model)

Learner_Class has three methods. Method __new__ creates the object and returns a learner or classifier, depending if examples where passed to the call. If the examples were passed as an argument than the method called the learner (invoking __call__ method). Method __init__ is invoked every time the class is called for the first time. Notice that all it does is remembers the only argument that this class can be called with, i.e. the argument name which defaults to ‘discretized bayes’. If you would expect any other arguments for your learners, you should handle them here (store them as class’ attributes using the keyword self).

If we have created an instance of the learner (and did not pass the examples as attributes), the next call of this learner will invoke a method __call__, where the essence of our learner is implemented. Notice also that we have included an attribute for vector of instance weights, which is passed to naive Bayesian learner. In our learner, we first discretize the data using Fayyad & Irani’s entropy-based discretization, then build a naive Bayesian model and finally pass it to a class Classifier. You may expect that at its first invocation the Classifier will just remember the model we have called it with:

class Classifier from nbdisc.py

class Classifier: def __init__(self, **kwds): self.__dict__.update(kwds) def __call__(self, example, resultType = orange.GetValue): return self.classifier(example, resultType)

The method __init__ in Classifier is rather general: it makes Classifier remember all arguments it was called with. They are then accessed through Classifiers’ arguments (self.argument_name). When Classifier is called, it expects an example and an optional argument that specifies the type of result to be returned.

This comples our code for naive Bayesian classifier with discretization. You can see that the code is fairly short (fewer than 20 lines), and it can be easily extended or changed if we want to do something else as well (include a feature subset selection, for instance, …).

Here are now a few lines to test our code:

uses iris.tab and nbdisc.py

> python
>>> import orange, nbdisc
>>> data = orange.ExampleTable("iris")
>>> classifier = nbdisc.Learner(data)
>>> print classifier(data[100])
Iris-virginica
>>> classifier(data[100], orange.GetBoth)
(, <0.000, 0.001, 0.999>)
>>>

For a more elaborate test that also shows the use of a learner (that is not given the data at its initialization), here is a script that does 10-fold cross validation:

nbdisc_test.py (uses iris.tab and nbdisc.py)

import orange, orngEval, nbdisc data = orange.ExampleTable("iris") results = orngEval.CrossValidation([nbdisc.Learner()], data) print "Accuracy = %5.3f" % orngEval.CA(results)[0]

The accuracy on this data set is about 92%. You may try to obtain a better accuracy by using some other type of discretization, or try some other learner on this data (hint: k-NN should perform better).

You can now read on to see how the same schema of developing your own classifier was used for to assemble all-in-python naive Bayesian method and see how it is very easy to implement bagging.



Prev: Build Your Own Learner, Next: Naive Bayes in Python, Up: Build Your Own Learner,