orngDisc: Orange Discretization Module


Module orngDisc implements some functions and classes that can be used for categorization of continuous attributes. While core Orange already has several classes that can help in this task, currently, orngDisc provides a function that may help in entropy-based discretization (Fayyad & Irani), and a wrapper around classes for categorization that can be used for learning.

entropyDiscretization(data)
A function that takes the classified data set (data) and categorizes all continuous attributes using the entropy based discretization (orange.EntropyDiscretization). After categorization, attribute that were categorized to a single interval (to a constant value) are removed from the data set. Returns the data set that includes all categorical and discretized continuous attributes from the original data set data.
EntropyDiscretization([data])
This is simple wrapper class around the function entropyDiscretization. Once invoked it would either create an object that can be passed a data set for discretization, or if invoked with the data set, would return a discretized data set: discretizer = orngDisc.EntropyDiscretization() disc_data = discretizer(data) another_disc_data = orngDisc.EntropyDiscretization(data)
DiscretizedLearner([baseLearner[, examples[, discretizer[, name]]]])
This class allows to set an learner object, such that before learning a data passed to a learner is discretized. In this way we can prepare an object that lears without giving it the data, and, for instance, use it in some standard testing procedure that repeats learning/testing on several data samples. Default procedure for discretization (discretizer) is orngDisc.EntropyDiscretization. An example on how such learner is set and used in ten-fold cross validation is given below:

bayes = orange.BayesLearner() dBayes = orngDisc.DiscretizedLearner(bayes, name='disc bayes') dbayes2 = orngDisc.DiscretizedLearner(bayes, name="EquiNBayes", \ discretizer=orange.Preprocessor_discretize(\ method=orange.EquiNDiscretization(numberOfIntervals=10))) results = orngEval.CrossValidation([dBayes], data) classifier = orngDisc.DiscretizedLearner(bayes, examples=data)

Examples

A chapter on feature subset selection in Orange for Beginners tutorial shows the use of DiscretizedLearner. Other discretization classes from core Orange are listed in chapter on categorization of the same tutorial.

References

UM Fayyad and KB Irani. Multi-interval discretization of continuous valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pages 1022--1029, Chambery, France, 1993.