orngMisc: Miscellaneous Utility Functions

This module contains miscellaneous useful functions for counting, criteria-based selections and similar.

Counters

orngMisc contains a bunch of classes that generate sequences of various kinds.

BooleanCounter

A class which represents a boolean counter. The constructor is given the number of bits and during the iteration the counter returns a list of that length with 0 and 1's in it.

One way to use the counter is within a for-loop:

>>> for t in orngMisc.BooleanCounter(3): ... print t ... [0, 0, 0] [0, 0, 1] [0, 1, 0] [0, 1, 1] [1, 0, 0] [1, 0, 1] [1, 1, 0] [1, 1, 1]

You can also call it manually.

>>> o = orngMisc.BooleanCounter(3) >>> o.next() [0, 0, 0] >>> o.next() [0, 0, 1] >>> o.next() [0, 1, 0]

The current state (the last result of a call to next) is also stored in the attribute state.

LimitedCounter

This class is similar to BooleanCounter except that the digits do not count from 0 to 1, but to the limits that are specified individually for each digit.

>>> for t in orngMisc.LimitedCounter([3, 5, 2]): ... print t ... [0, 0, 0] [0, 0, 1] [0, 1, 0] [0, 1, 1] [0, 2, 0] [0, 2, 1] [0, 3, 0] [0, 3, 1] [0, 4, 0] [0, 4, 1] [1, 0, 0] [1, 0, 1] [1, 1, 0] [1, 1, 1] (etc.)

The class can be used in a for-loop or by manually calling next, just as BooleanCounter above.

NondecreasingCounter

Nondecreasing counter generates all non-decreasing integer sequences in which no numbers are skipped, that is, if n is in the sequence, the sequence also includes all numbers between 0 and n. For instance, [0, 0, 1, 0] is illegal since it decreases, and [0, 0, 2, 2] is illegal since it has 2 without having 1 first. Or, with an example

>>> for t in orngMisc.NondecreasingCounter(4): ... print t ... [0, 0, 0, 0] [0, 0, 0, 1] [0, 0, 1, 1] [0, 0, 1, 2] [0, 1, 1, 1] [0, 1, 1, 2] [0, 1, 2, 2] [0, 1, 2, 3]

CannonicFuncCounter

Returns all sequences of a given length where no numbers are skipped (see above) and none of the generated sequence is equal to another if only the labels are changed. For instance, [0, 2, 2, 1] and [1, 0, 0, 2] are considered equivalent: if we take the former and replace 0 by 1, 2 by 0 and 1 by 2 we get the second list.

The sequences generated are equivalent to all possible functions from a set of cardinality of the sequences length.

>>> for t in orngMisc.CanonicFuncCounter(4): ... print t ... [0, 0, 0, 0] [0, 0, 0, 1] [0, 0, 1, 0] [0, 0, 1, 1] [0, 0, 1, 2] [0, 1, 0, 0] [0, 1, 0, 1] [0, 1, 0, 2] [0, 1, 1, 0] [0, 1, 1, 1] [0, 1, 1, 2] [0, 1, 2, 0] [0, 1, 2, 1] [0, 1, 2, 2] [0, 1, 2, 3]

Selection of the "optimal" object

Many machine learning techniques generate a set different solutions or have to choose, as for instance in classification tree induction, between different attributes. The most trivial solution is to iterate through the candidates, compare them and remember the optimal one. The problem occurs, however, when there are multiple candidates that are equally good, and the naive approaches would select the first or the last one, depending upon the formulation of the if-statement.

orngMisc provides a class that makes a random choice in such cases. Each new candidate is compared with the currently optimal one; it replaces the optimal if it is better, while if they are equal, one is chosen by random. The number of competing optimal candidates is stored, so in this random choice the probability to select the new candidate (over the current one) is 1/w, where w is the current number of equal candidates, including the present one. One can easily verify that this gives equal chances to all candidates, independent of the order in which they are presented.

BestOnTheFly

A class which finds the optimal object in a sequence of objects. The class is fed the candidates one by one, and remembers the "winner". It can thus be used by methods that generate different "solutions" to a problem and need to select the optimal one, but do not want to store them all.

Methods

__init__(compare = cmp, seed = 0, callCompareOn1st = False)
Constructor can be given a compare function and a random seed; both arguments are optional. The default function for comparison supposes that the objects can be compared by the usual comparison operators. If the seed is not given, a seed of 0 is used to ensure that the same experiment always gives the same results, despite pseudo-randomness. If callCompareOn1st is set, BestOnTheFly will suppose that the candidates are lists are tuples, and it will call compare with the first element of the tuple.
candidate(x)
Candidates are fed to a function candidate, which accepts a single argument (the candidate).
winner()
Returns the (currently) optimal object. This function can be called any number of times, even when the candidates are still coming - it doesn't "collapse" anything.
winnerIndex()
Returns the index of the optimal object within the sequence of the candidates.

selectBest(l, compare = cmp, seed = 0, callCompareOn1st = False)

Function selectBest (it is a separate function, not a method of BestOnTheFly) returns the optimal object from list l. The function can be used if the candidates are already in the list, so using the more complicated BestOnTheFly directly is not needed.

To demonstrate the use of BestOnTheFly and present selectBest: here's how selectBest is implemented.

def selectBest(x, compare=cmp, seed = 0, callCompareOn1st = False): bs=BestOnTheFly(compare, seed, callCompareOn1st) for i in x: bs.candidate(i) return bs.winner()

selectBestIndex(l, compare=cmp, seed=0, callCompareOn1st = False)

Similar to selectBest except that it doesn't return the best object but its index in the list l.

compare2_bigger, compare2_smaller

Those two functions can be used as compare operators for the above class and functions. The first returns the largest and the second the smallest objects (the function are equal to cmp and -cmp, respectively).

compare2_firstBigger, compare2_firstSmaller, compare2_lastBigger, compare2_lastSmaller

These function assume that the objects being compared are lists (or tuples). To compare objects x and y they compare x[0] with y[0] (the first two functions) or x[-1] with y[-1] (the other two).

Example

The following snippet loads the data set lymphography and prints out the attribute with the highest information gain.

part of misc_bestOnTheFly.py

import orange, orngMisc data = orange.ExampleTable("lymphography") findBest = orngMisc.BestOnTheFly(callCompareOn1st = True) for attr in data.domain.attributes: findBest.candidate((orange.MeasureAttribute_gainRatio(attr, data), attr)) print findBest.winner()

Our candidates are tuples gain ratios and attributes, so we set callCompareOn1st to make the compare function compare the first element (gain ratios). We could achieve the same by initializing the object like this:

findBest = orngMisc.BestOnTheFly(orngMisc.compare2_firstBigger)

The other way to do it is through indices.

part of misc_bestOnTheFly.py

findBest = orngMisc.BestOnTheFly() for attr in data.domain.attributes: findBest.candidate(orange.MeasureAttribute_gainRatio(attr, data)) bestIndex = findBest.winnerIndex() print data.domain[bestIndex],", ", findBest.winner()

Here we only give gain ratios to bestOnTheFly, so we don't have to specify a special compare operator. After checking all attributes we get the index of the optimal one by calling winnerIndex.

Other functions

frange(start, stop, step)

This function is equivalent to the built-in range, except that start, stop and step can be continuous (float) and that it returns the values between start and stop, including stop (e.g., frange(0, 1, 0.5) returns [0.0, 0.5, 1.0].

Default arguments are somewhat peculiar.

  • frange() is equivalent to frange(0, 1, 0.1)
  • frange(x) is equivalent to frange(x, 1, x), for instance, frange(0.25) gives [0.25, 0.5, 0.75, 1.0]
  • frange(x, y) is equivalent to frange(0, x, y), therefore frange(3, .5) returns [0.0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0]

printVerbose(text[, verb])

Prints out the text text if verb is True. The second argument can be omitted - this is generally recommnded; in this case, the text is printed if orngMisc.verbose is True.

This function supports a general "verbose mode" for the application. Functions have some verbose info to print out, should do so through orngMisc.printVerbose.

Most modules sadly ignore this function and have their individual "verbose" arguments.

getObjectName(x, default="")

Returns a suitable name of object x by first checking whether it has attribute name, shortDescription, description, func_doc or func_name, in that order. If it has none of this, but is a class instance, it returns the class name. If none of these succeeds, it returns the value of argument default.

The function is used primarily for getting the name of a learner or classifier.

demangleExamples(x)

A function used by several functions (mostly in orngTest) that can accept either an examples generator or a tuple (example generator, weight) as an argument. demangleExamples accepts any of these two as an argument and returns a tuple (with the weight meta id set to 0 if non was given).