Module orngVizRank implements VizRank algorithm (Leban et al, 2004; Leban et al, 2005) which is able to rank possible data projections generated using two different visualization methods - scatterplot and radviz method. For a given class labeled data set, VizRank creates different possible data projections and assigns a score of interestingness to each of the projections. VizRank scores the projections based on how well are different classes separated in the projection. If different classes are well separated the projection gets a high score, otherwise the score it is correspondingly lower. After evaluation it is sensible to focus on top-ranked projections that provide the greatest insight on how to separate between different classes.
In the rest of this document we will talk about two different visualization methods - scatterplot and radviz. While scatterplot is a well known method, not many people know radviz. For those readers who are interested in this method, please see (Hoffman, 1997).
The easiest way to use VizRank in Orange is through Orange widgets. Widgets like Scatterplot, Radviz and Polyviz (which can be found in Visualize tab in Orange Canvas) contain a button "VizRank" which opens VizRank's dialog where you can change all possible settings and find interesting data projections.
A more advanced user, however, will perhaps also want to use VizRank in scripts. These users will use the orngVizRank module.
In the rest of this document we will give information only about using VizRank in scripts. For those of you who will use VizRank in Orange widgets we provided extensive tooltips that should clarify the meaning of different settings.
Creating a VizRank instance
First lets show a very simple example of how we can use VizRank in scripts:
In this example we created a VizRank instance, evaluated scatterplot projections of the UCI wine data set and printed the information about the best ranked projection. The best projection scored a value of 86.88 (in a range between 0 and 100) and is showing attributes 'A7' and 'A10'. There is also lots of other information for each projections in the result list, but it is not relevant for a casual user.
Below is a list of functions and settings, that can be used in order to modify VizRank's behaviour.
CLASS_ACCURACY
), average probability of correct classification (AVERAGE_CORRECT
) or Brier score (BRIER_SCORE
). Default: AVERAGE_CORRECT
LEAVE_ONE_OUT
), 10 fold cross validation (TEN_FOLD_CROSS_VALIDATION
) or testing on the learning set (TEST_ON_LEARNING_SET
). Default: TEN_FOLD_CROSS_VALIDATION
CONT_MEAS_RELIEFF
), Signal to Noise (CONT_MEAS_S2N
), a modification of Signal to Noise measure (CONT_MEAS_S2NMIX
) or no measure (CONT_MEAS_NONE
). Default: CONT_MEAS_RELIEFF
DISC_MEAS_RELIEFF
), Gain ratio(DISC_MEAS_GAIN
), Gini index (DISC_MEAS_GINI
) or no measure (DISC_MEAS_NONE
). Default: DISC_MEAS_RELIEFF
attrCont
and attrDist
variables) and when tested all possible combinations progress to worse ranked attributes. If value set to 1, heuristic will also first rank attributes but will then randomly select attributes according to gamma distribution - this way the better ranked attributes will still be selected more often, but sometimes they will be tested in a combination with attributes that are poorly ranked but can in the end produce high-ranked projection. In domains with a larger set of attributes (>20) it is advisable to use gamma distribution, otherwise we never come to evaluate projections with proorly ranked attributes. Default: 0attrCont
and attrDisc
) we will most likely find projections with the highest scores at the beginning of the evaluation. Default: 2Radviz specific settings: Methods: VizRank can also be used as a learning method. You can construct a learner by creating an instance of the VizRankLearner class. VizRankLearner can actually accept three parameters. First is the type of the visualization method to use ( To change the VizRank's settings we simply access them through the learner.VizRank instance (e.g. The learner instance can be used as any other learners. If you provide it the examples it returns a classifier of type When classifying VizRank classifier will use the evaluated projections to make class prediction for the new example. Evaluated projection will serve as arguments for each class value. Arguments have different values (weights) and the example is classified to the class which has the highest sum of argument values. VizRank's settings that are relevant when using VizRank as a classifier: A simple example:
Leban, G., Bratko, I., Petrovic, U., Curk, T., Zupan, B. VizRank: finding informative data projections in functional genomics
by machine learning. Bioinformatics 21, 413-414 (2005). Leban, G., Mramor, M., Bratko, I., Zupan, B.: Simple and Effective Visual Models for Gene Expression Cancer Diagnostics, KDD-2005
167--177 (Chicago, 2005). Hoffman, P. E., Grinstein, G. G., Marx, K., Grosse, I. & Stanley, E.: DNA Visual and Analytic Data Mining. IEEE Visualization 1997 1, 437-441 (1997).
attributeCount
below. Possible values are EXACT_NUMBER_OF_ATTRS
and MAXIMUM_NUMBER_OF_ATTRS
. Default: MAXIMUM_NUMBER_OF_ATTRS
optimizationType == MAXIMUM_NUMBER_OF_ATTRS
then we will consider projections that have between 3 and attributeCount
attributes. If optimizationType == EXACT_NUMBER_OF_ATTRS
then we will consider only projections that have exactly attributeCount
attributes. Default: 4
evaluationTime
minutes.
VizRank as a learner
SCATTERPLOT
or
RADVIZ
). The second parameter is an instance of VizRank class. If it is not given, a new instance is created. The third parameter is a graph instance - orngScaleScatterPlotData
or orngScaleRadvizData
instance. If it is not specified, a new instance is created.learner.VizRank.kValue = 10
).VizRankClassifier
which can be used as any other classifier:
References