Filters are objects that can be used for selecting examples. They are somewhat related to preprocessors. Filters are more limited, they can accept or reject examples, but cannot modify them. Additional restriction of filters is that they only see individual examples, not entire datasets. This is important at random selection of examples (see Filter_random
).
All filters have two attributes.
Attributes
negate
domain
Filter_random
, which ignores this field).Besides the constructor, filters provide the call operator and a method that returns a list denoting which examples match the filter criterion.
Attributes
True
or False
.ExampleTable
) that matches the criterion.examples
, denoting which examples are accepted. Equivalent to [filter(ex) for ex in examples]
An alternative way to apply a filter to a table of examples is to call ExampleTable.filter
.
Filter_random
accepts an example with a given probability.
Attributes
The inherited attribute domain
is ignored.
For this script, example
should be some learning example; you can load any data and set example = data[0]
. Script's result will always be the same. Although the probability of selecting an example is set to 0.7, the filter accepted the example six times out of ten. Since filter only sees individual examples, it cannot be accurate; if you need to select exactly 70% of examples in a dataset, use a random indices.
Setting the random generator ensures that the filter will always select the same examples, disregarding of how many times you run the script or what you do (in Orange) before you run it. randomGenerator=24
is a shortcut for randomGenerator = orange.RandomGenerator(24)
or randomGenerator = orange.RandomGenerator(initseed=24)
.
To select a subset of examples instead of calling the filter for each individual example, use the filter like this.
Filter_isDefined
selects examples for which all attribute values are defined (known). By default, the filter checks all attributes; you can modify the list check
to select the attributes to be checked. This filter never checks meta-attributes are not checked. (There is an obsolete filter Filter_hasSpecial
, which does the opposite, that is, selects examples with at least one unknown value, in any of attributes, including the class attribute. Filter_hasSpecial
always checks all attributes except meta-attributes.) Filter_hasClass
selects examples with defined class value. You can use negate
to invert the selection, as shown in the script below.
Attributes
check
is None
, meaning that all attributes are checked. The list is initialized to a list of true
s when the filter's domain
is set unless the list already exists. You can also set check
manually, even without setting the domain
. The list can be indexed by ordinary integers (e.g., check[0]
); if domain
is set, you can also address the list by attribute names or descriptors.As for all Orange objects, it is not recommended to modify the domain
after it has been set once, unless you know exactly what you are doing. In this particular case, changing the domain would disrupt the correspondence between the domain attributes and the check
list, causing unpredictable behaviour.
part of filter.py (uses lenses.tab)
Filter Filter_hasMeta
filters out the attributes that don't have (or that do have, when negate
d) a meta attribute with the given id.
This is filter is especially useful with examples from basket format and their optional meta attributes. If they come, for instance, from a text mining domain, we can use it to get the documents that contain a certain word.
part of filterm.py (uses inquisition.basket)
This example, which will print out all instances that contain the word "surprise", gets the id of the meta attribute from the domain by searching for the attribute named "surprise". This meta attribute is optional and does not necessarily appear in all examples. To fully understand how this particular example works, you should be familiar with optional meta attributes and the basket file format.
This filter can of course also be used in other situations involving meta values that appear only in some examples. The corresponding attributes do not need to be registered in the domain.
Filter_sameValue
is a fast filter for selecting examples with particular value of some attribute.
Attributes
If domain
is not set, make sure that examples are from the right domain so that position
applies to the attribute you want.
part of filter.py (uses lenses.tab)
This script select examples with age="young" from lenses dataset. Setting position is somewhat tricky: data.domain.attributes
behaves as a list and provides method index
, which we can use to retrieve the position of attribute age
. The attribute age
is also needed to construct a Value
.
As you can see, this filter is dirty but quick.
ValueFilter class provides different methods for filtering values of countinuous attributes: ValueFilter.Equal
, ValueFilter.Less
, ValueFilter.LessEqual
, ValueFilter.Greater
, ValueFilter.GreaterEqual
, ValueFilter.Between
, ValueFilter.Outside
.
In the following excerpt there are two different filters used: ValueFilter.GreaterEqual
which needs only one parameter and ValueFilter.Between
which needs to be defined by two parameters.
part of filterv.py (uses iris.tab)
Filter_Values
performs a similar function as Filter_sameValue
, but can handle conjunctions and disjunctions of more complex conditions.
Attributes
ValueFilterList
that contains conditions.true
, example is accepted if no values are rejected. If false
, example is accepted if at least one value is accepted.Elements of list conditions
must be objects of type ValueFilter_discrete
for discrete and ValueFilter_continuous
for continuous attributes; both are derived from ValueFilter
.
Both have fields position
denoting the position of the checked attribute (just as in Filter_sameValue
) and acceptSpecial
that determines whether undefined values are accepted (1), rejected (0) or simply ignored (-1, default).
ValueFilter_discrete
has field values
of type ValueList
that contains objects of type Value
that represent the acceptable values.
ValueFilter_continous
has fields min
and max
that define an interval, and field outside
that tells whether values outside or inside interval are accepted. Default is false
(inside).
part of filter.py (uses lenses.tab)
This script selects examples whose age is "young" or "presbyopic" and which are astigmatic. Unknown values are ignored (if value for one of the two attributes is missing, only the other is checked; if both are missing, example is accepted).
Script first constructs the filter and assigns a domain. Then it appends both conditions to the filter's conditions
field. Both are of type orange.ValueFilter_discrete
, since the two attributes are discrete. Position of the attribute is obtained the same way as for Filter_sameValue
, described above.
The list of conditions can also be given to filter constructor. The following filter will accept examples whose age is "young" or "presbyopic" or who are astigmatic (conjunction = 0
). For contrast from above filter, unknown age is not acceptable (but examples with unknown age can still be accepted if they are astigmatic). Meanwhile, examples with unknown astigmatism are always accepted.
part of filter.py (uses lenses.tab)
If you don't find this filter attractive, use Preprocessor_take
instead, which is less flexible but more intelligent and friendly.