Classes for Finding Nearest Neighbours

Orange provides classes for finding the nearest neighbours of the given reference example. While we might add some smarter classes in future, we now have only two - abstract classes that defines the general behaviour of neighbour searching classes, and classes that implement brute force search.

As usually in Orange, there is a pair of classes: a class that does the work (FindNearest) and a class that constructs it ("learning" - getting the examples and arranging them in an appropriate data structure that allows for searching) (FindNearestConstructor). These two classes are abstract, the classes you can actually use are FindNearest_BruteForce and FindNearestConstructor_BruteForce.


FindNearest

FindNearest returns nearest neighbours of a given example. The class is abstract and it is up to derived classes to decide how the examples are stored and searched. Note that the class even does not have a component for measuring distances between examples; derived classes can either store such a component or store examples in such a way that the component is unnecessary.

Attributes

distanceID
Id of a meta attribute that should store the distance from the reference example; if 0 (default), distances are not stored.
includeSame
Tells whether to include the examples that are same as the reference; default is true.

Methods

__call__(example, k[, needsClass])

Returns an ordered list of k nearest neighbours of example. If examples are weighted, the length of the list is not necessarily k, but the total weight of the returned examples will be as close to k as possible. (If there are enough examples, the total weight will be at least k.)

If k is zero, all examples are returned, but sorted by their distances to example.

If the third argument, needsClass is given and true, FindNearest will ignore all examples with missing classes. If the domain is classless, this flag has no effect.

FindNearestConstructor

The job of FindNearestConstructor is to construct an instance of FindNearest.

Attributes

distanceConstructor
A component of class ExamplesDistanceConstructor that "learns" to measure distances between examples. Learning can be, for examples, storing the ranges of continuous attributes or the number of value of a discrete attribute (see the page about measuring distances for more information). The result of learning is an instance of ExamplesDistance that should be used for measuring distances between examples.
includeSame
Tells whether to include the examples that are same as the reference; default is true.

Methods

__call__(examples, weightID, distanceID)
Constructs an instance of FindNearest that would return neighbours of a given example, obeying weightID when counting them (also, some measures of distance might consider weights as well) and store the distances in a meta attribute with ID distanceID.

FindNearest_BruteForce

A class for brute force search for nearest neighbours. It stores a table of examples (it's its own copy of examples, not only ExampleTable with references to another ExampleTable). When asked for neighbours, it measures distances to all examples, stores them in a heap and returns the first k as an ExampleTable with references to examples stored in FindNearest_BruteForce's field examples).

Attributes

distance
a component that measures distances between examples
examples
a stored list of examples
weighID
ID of meta attribute with weight

FindNearestConstructor_BruteForce

A class that constructs FindNearest_BruteForce. It calls the inherited distanceConstructor and then passes the constructed distance measure, among with examples, weightID and distanceID, to the just constructed instance of FindNearest_BruteForce.

If there are more examples with the same distance fighting for the last places, the tie is resolved by randomly picking the appropriate number of examples. A local random generator is constructed and initiated by a constant computed from the reference example. The effect of this is that same random neighbours will be chosen for the example each time FindNearest_BruteForce is called.

The following script shows how to find the five nearest neighbours of the first example in the lenses dataset.

examplesdistance.py (uses lenses.tab)

import orange data = orange.ExampleTable("lenses") nnc = orange.FindNearestConstructor_BruteForce() nnc.distanceConstructor = orange.ExamplesDistanceConstructor_Euclidean() did = orange.newmetaid() nn = nnc(data, 0, did) print "*** Reference example: ", data[0] for ex in nn(data[0], 5): print ex

First, we prepare a constructor nnc, and tell it that we want to use Euclidean distance. The we obtain a new id for meta-attribute and call the constructor. The resulting object nn is trained to find neighbours of examples in lenses dataset. Finally, in the for loop, we call nn to find the five nearest neighbours of data[0] and print them out.

*** Reference example: ['young', 'myope', 'no', 'reduced', 'none'] ['young', 'myope', 'no', 'reduced', 'none'], {-2:0.00} ['young', 'myope', 'yes', 'reduced', 'none'], {-2:1.00} ['young', 'hypermetrope', 'no', 'reduced', 'none'], {-2:1.00} ['pre-presbyopic', 'myope', 'no', 'reduced', 'none'], {-2:1.00} ['presbyopic', 'myope', 'no', 'reduced', 'none'], {-2:1.00}

All returned examples here have a new meta-attribute with id -2 that shows the distance from the reference example.