k Nearest Neighbours

kNN is one of the simplest learning techniques - the learner only needs to store the examples, while the classifier does its work by observing the most similar examples of the example to be classified.

Classes for kNN learner and classifier are strongly related to classes for measuring distances, described on a separate page.


kNNLearner

Attributes

k
Number of neighbours. If set to 0 (which is also the default value), the square root of the number of examples is used. Changed: the default used to be 1.
rankWeight
Enables weighting by ranks (default: true)
distanceConstructor
A component that constructs the object for measuring distances between examples.

kNNLearner first constructs an object for measuring distances between examples. distanceConstructor is used if given; otherwise, Euclidean metrics will be used. kNNLearner then constructs an instance of FindNearest_BruteForce. Together with ID of meta attribute with weights of examples, k and rankWeight, it is passed to a kNNClassifier.

kNNClassifier

Attributes

findNearest
A component that finds nearest neighbours of a given example.
k
Number of neighbours. If set to 0 (which is also the default value), the square root of the number of examples is used. Changed: the default used to be 1.
rankWeight
Enables weighting by ranks (default: true).
weighID
ID of meta attribute with weights of examples
nExamples
The number of learning examples. It is used to compute the number of neighbours if k is zero.

When called to classify an example, the classifier first calls findNearest to retrieve a list with k nearest neighbours. The component findNearest has a stored table of examples (those that have been passed to the learner) together with their weights. If examples are weighted (non-zero weightID), weights are considered when counting the neighbours.

If findNearest returns only one neighbour (this is the case if k=1), kNNClassifier returns the neighbour's class.

Otherwise, the retrieved neighbours vote about the class prediction (or probability of classes). Voting has double weights. As first, if examples are weighted, their weights are respected. Secondly, nearer neighbours have greater impact on the prediction; weight of example is computed as exp(-t2/s2), where the meaning of t depends on the setting of rankWeight.

  • if rankWeight is false, t is a distance from the example being classified;
  • if rankWeight is true, neighbours are ordered and t is the position of the neighbour on the list (a rank)
  • .

In both cases, s is chosen so that the impact of the farthest example is 0.001.

Weighting gives the classifier certain insensitivity to the number of neighbours used, making it possible to use large k's.

The classifier can treat continuous and discrete attributes, and can even distinguish between ordinal and nominal attributes. See information on distance measuring for details.

We will test the learner on 'iris' dataset. We shall split it onto train (80%) and test (20%) sets, learn on training examples and test on five randomly selected test examples.

part of knnlearner.py (uses iris.tab)

>>> import orange, orngTest, orngStat >>> data = orange.ExampleTable("iris") >>> >>> rndind = orange.MakeRandomIndices2(data, p0=0.8) >>> train = data.select(rndind, 0) >>> test = data.select(rndind, 1) >>> >>> knn = orange.kNNLearner(train, k=10) >>> for i in range(5): ... example = test.randomexample() ... print example.getclass(), knn(example) ... Iris-setosa Iris-setosa Iris-versicolor Iris-versicolor Iris-versicolor Iris-versicolor Iris-setosa Iris-setosa Iris-setosa Iris-setosa

The secret of kNN's success is that the examples in iris dataset appear in three well separated clusters. The classifier's accuraccy will remain excellent even with very large or small number of neighbours.

As many experiments have shown, a selection of examples distance measure does not have a greater and predictable effect on the performance of kNN classifiers. So there is not much point in changing the default. If you decide to do so, you need to set the distanceConstructor to an instance of one of the classes for distance measuring.

>>> knn = orange.kNNLearner() >>> knn.k = 10 >>> knn.distanceConstructor = orange.ExamplesDistanceConstructor_Hamming() >>> knn = knn(train) >>> for i in range(5): ... example = test.randomexample() ... print example.getclass(), knn(example) ... Iris-virginica Iris-versicolor Iris-setosa Iris-setosa Iris-versicolor Iris-versicolor Iris-setosa Iris-setosa Iris-setosa Iris-setosa

Still perfect. I've told you.