This module contains miscellaneous useful functions for counting, criteria-based selections and similar.
orngMisc
contains a bunch of classes that generate sequences of various kinds.
A class which represents a boolean counter. The constructor is given the number of bits and during the iteration the counter returns a list of that length with 0 and 1's in it.
One way to use the counter is within a for-loop:
You can also call it manually.
The current state (the last result of a call to next
) is also stored in the attribute state
.
This class is similar to BooleanCounter
except that the digits do not count from 0 to 1, but to the limits that are specified individually for each digit.
The class can be used in a for-loop or by manually calling next
, just as BooleanCounter
above.
Nondecreasing counter generates all non-decreasing integer sequences in which no numbers are skipped, that is, if n is in the sequence, the sequence also includes all numbers between 0 and n. For instance, [0, 0, 1, 0] is illegal since it decreases, and [0, 0, 2, 2] is illegal since it has 2 without having 1 first. Or, with an example
Returns all sequences of a given length where no numbers are skipped (see above) and none of the generated sequence is equal to another if only the labels are changed. For instance, [0, 2, 2, 1] and [1, 0, 0, 2] are considered equivalent: if we take the former and replace 0 by 1, 2 by 0 and 1 by 2 we get the second list.
The sequences generated are equivalent to all possible functions from a set of cardinality of the sequences length.
Many machine learning techniques generate a set different solutions or have to choose, as for instance in classification tree induction, between different attributes. The most trivial solution is to iterate through the candidates, compare them and remember the optimal one. The problem occurs, however, when there are multiple candidates that are equally good, and the naive approaches would select the first or the last one, depending upon the formulation of the if-statement.
orngMisc
provides a class that makes a random choice in such cases. Each new candidate is compared with the currently optimal one; it replaces the optimal if it is better, while if they are equal, one is chosen by random. The number of competing optimal candidates is stored, so in this random choice the probability to select the new candidate (over the current one) is 1/w, where w
is the current number of equal candidates, including the present one. One can easily verify that this gives equal chances to all candidates, independent of the order in which they are presented.
A class which finds the optimal object in a sequence of objects. The class is fed the candidates one by one, and remembers the "winner". It can thus be used by methods that generate different "solutions" to a problem and need to select the optimal one, but do not want to store them all.
Methods
callCompareOn1st
is set, BestOnTheFly
will suppose that the candidates are lists are tuples, and it will call compare
with the first element of the tuple.candidate
, which accepts a single argument (the candidate).Function selectBest
(it is a separate function, not a method of BestOnTheFly
) returns the optimal object from list l
. The function can be used if the candidates are already in the list, so using the more complicated BestOnTheFly
directly is not needed.
To demonstrate the use of BestOnTheFly
and present selectBest
: here's how selectBest
is implemented.
Similar to selectBest
except that it doesn't return the best object but its index in the list l
.
Those two functions can be used as compare operators for the above class and functions. The first returns the largest and the second the smallest objects (the function are equal to cmp
and -cmp
, respectively).
These function assume that the objects being compared are lists (or tuples). To compare objects x
and y
they compare x[0]
with y[0]
(the first two functions) or x[-1]
with y[-1]
(the other two).
The following snippet loads the data set lymphography and prints out the attribute with the highest information gain.
part of misc_bestOnTheFly.py
Our candidates are tuples gain ratios and attributes, so we set callCompareOn1st
to make the compare function compare the first element (gain ratios). We could achieve the same by initializing the object like this:
The other way to do it is through indices.
part of misc_bestOnTheFly.py
Here we only give gain ratios to bestOnTheFly
, so we don't have to specify a special compare operator. After checking all attributes we get the index of the optimal one by calling winnerIndex
.
This function is equivalent to the built-in range
, except that start
, stop
and step
can be continuous (float
) and that it returns the values between start
and stop
, including stop
(e.g., frange(0, 1, 0.5)
returns [0.0, 0.5, 1.0]
.
Default arguments are somewhat peculiar.
frange()
is equivalent to frange(0, 1, 0.1)
frange(x)
is equivalent to frange(x, 1, x)
, for instance, frange(0.25)
gives [0.25, 0.5, 0.75, 1.0]
frange(x, y)
is equivalent to frange(0, x, y)
, therefore frange(3, .5)
returns [0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0]
Prints out the text text
if verb
is True
. The second argument can be omitted - this is generally recommnded; in this case, the text is printed if orngMisc.verbose
is True
.
This function supports a general "verbose mode" for the application. Functions have some verbose info to print out, should do so through orngMisc.printVerbose
.
Most modules sadly ignore this function and have their individual "verbose" arguments.
Returns a suitable name of object x
by first checking whether it has attribute name
, shortDescription
, description
, func_doc
or func_name
, in that order. If it has none of this, but is a class instance, it returns the class name. If none of these succeeds, it returns the value of argument default
.
The function is used primarily for getting the name of a learner or classifier.
A function used by several functions (mostly in orngTest
) that can accept either an examples generator or a tuple (example generator, weight) as an argument. demangleExamples
accepts any of these two as an argument and returns a tuple (with the weight meta id set to 0 if non was given).