It can often happen that the definition of a discrete attribute (EnumVariable
) declares values that do not actually appear in the data, either originally or as a consequence of some preprocessing. Such anomalies are taken care of by class
that, given an attribute and the data, determines whether there are any unused values and reduces the attribute if needed. There are four possible cases.
None
.removeOneValued
which is False
by default, so such attributes are retained unless explicitly specified otherwise.ClassifierByLookupTable1
is used for mapping).Attributes
False
).Let us show the use of the class on a simple dataset with three examples, given by the following tab-delimited file.
The below script construct a list newattrs
which contains either the original attribute, None
or a reduced attribute, for each attribute from the original dataset.
part of unusedValues.py (uses unusedValues.tab)
And here's the script's output.
Attributes a
and y
are OK and are left alone. In b
, value 1 is not used and is removed (not in the original attribute, of course; a new attribute is created). c
is useless and is removed altogether. d
is retained since removeOneValued
was left at False
; if we set it to True
, this attribute would be removed as well.
The values of the new attribute for b
are automatically computed from the original. The script can thus proceed as follows.
part of unusedValues.py (uses unusedValues.tab)
List newattrs
includes some original attributes (a
, d
and y
) a new attribute (b
) and a None
(for c
). The latter is removed by filter
called at the beginning of the script. We use filteredattrs
to construct a new domain and then convert the original data
to newdata
. As the output shows, the two tables are the same except for the removed attribute c
.