This module contains tool for performing principal components analysis on data stored as Example table
.
PCA(dataset = None, attributes = None, rows = None, standardize = 0, imputer = defaultImputer, continuizer = defaultContinuizer, maxNumberOfComponents = 10, varianceCovered = 0.95, useGeneralizedVectors = 0)
ExampleTable
instance on which PCA will be performed. If None
, only parameters are set and PCA instance
is returned. Projection can then be performed like this:ExampleTable
instance and there should
be at least two. If None
, whole domain is used.part of PCA1.py
True
/False
array or list with the same length as number of examples in ExampleTable
instance.
Only examples that corresponds to True
will be used for projection. If None
, all data is used.part of PCA1.py
True
, standardization of data is performed before projection.orange.Imputer
instance. Defines how data is imputed if values are missing. Must NOT be trained. Default is average
imputationorange.Continuizer
instance. Defines how data is continuized. Default values:
- Multinomial -> as normalized ordinal
- Class -> ignore
- Continuous -> leave
Example on how to use your own imputer and continuizer (PCA2.py)
True
, generalized vectors are used.part of PCA3.py
Output:
Object of this class is returned when PCA is performed successfully. It will contain domain of data on which PCA was performed, imputer and continuizer for use in projection, center, deviation, evalues and loadings for PCA. It will also store data of the last projection performed for use with biplot.
Summary of projection can be obtained by printing PCAClassifier instance after PCA projection was successfully completed.
Projection can be performed by calling PCA classifier instance with ExampleTable
instance. Projection will fail
if ExampleTable
instance domain is not the same as in training set (however, it does not have to be in the same order).
New ExampleTable
instance that is returned will have data projected and domain PC+N where N is goes from 1 to number of
components.
Matplotlib is needed.
Creates scree plot for current PCA.
title: title of the scree plot
filename: path and filename to where the figure should be saved. If None
figure is displayed directly.
Creates a biplot for current projection.
Before calling biplot at least one projection must be performed (it will plot last performed projection).
- choices: two components number that will be plotted, first on x-axis, second on y-axis. Components are numbered from 1 to N where N is number of components returned by PCA. Biplot does not work if there is only one component available. Only the default is a biplot in the strict sense
- scale: transformed data is scaled by lambda ^ scale and loadings are scaled by 1/(lambda ^ scale) where lambda are the singular values as computed by princomp multiplied by square root of data length. Normally scale is inside [0, 1], and a warning will be printed if the specified scale is outside this range.
- title and filename: same as for plot
part of PCA5.py
Output stored in file biplot.png
:
orange.ImputerConstructor_average(dataset)
.Creates default continuizer with:
- multinomial -> as normalized ordinal
- class -> ignore
- continuous -> leave
dataMatrix
is instance of numpy.array