Domain descriptor (orange.Domain
) serves three purposes.
orange.Domain
is a list of attributes. Each example (orange.Example
) is associated with a domain descriptor, and so are example tables, many classifiers and other objects. For instance, learning examples are stored as list of integers/floats (an array of C unions, actually) and only the associated domain gives them a meaning.orange.Domain
's other responsibility is to convert examples between domains. It is through orange.Domain
that classifiers convert the testing examples if they are presented in different domain (ie. with different attributes than the learning examples).orange.Domain
.orange.Domain
has a few public accessible fields. Domain descriptor is referenced by many objects and modifying it is so unsafe, that it even the underlying C++ code only performs it on fresh new domains.
Attributes
None
if domain is classless. In the latter case, attributes
and variables
are equal.domainVersion
s.
There are numerous ways to construct an orange.Domain
. Let a
, b
and c
be three attribute descriptors (for instance, created by a, b, c = [orange.EnumVariable(x) for x in ["a", "b", "c"]]
.
orange.Variable
. The last attribute in the list will be the class attribute.
Class-attribute
must be an attribute descriptor (orange.Variable
), not a string with an attribute name.
source
, can be an existing domain or a list of attribute descriptors.
d2
, we want it to include "a"
, b
and c
. To resolve string "a"
, Orange checks the given source, domain d1
, to see whether it contains an attribute with that name. As it does, the descriptor (a
) is added to d2
. The other two attributes, b
and c
are added without checking d1
.Here, d2
includes all three attributes but no class attributes. Note also that we have used a list of attribute descriptor as source
instead of an existing domain, as we did in the previous example.
d1
with attributes a
and b
, and c
was the class. In the second domain, d2
, c
becomes an ordinary attribute and a
is the class attribute.
This constructor can be used to add classes to classless domains.
d1
is a classless domain and in d2
we added the attribute c
as class attribute. If c>
existed before (as an ordinary attribute), it would, naturally, be removed from the list of ordinary attributes.
There are three convenient functions for checking whether the domain contains any discrete, continuous or other-type attributes.
Examples can be converted from one domain to another by calling the domain descriptor.
domain2.py (uses monk1.tab)
You will probably convert examples this way when writing your own classifiers in Python. Existing classifiers do exactly the same. orange.BayesClassifier
, for instance, stores the domain of the learning examples and calls it to convert the examples to be classified.
An equivalent way of converting examples is to construct a new example, passing the new domain to the constructor.
Example tables can be converted in a similar manner.
Meta-values are additional values that can be attached to examples and can have any meaning you want. It is not necessary that all examples in an example table (or even all examples from some domain) have certain meta-value. See documentation on Example
for a more thorough description of meta-values.
Meta attributes that appear in examples can, but don't need to be known to the domain descriptor. Even if they are known, there are no obligations in one way or another: domain does not need to know about any meta values that are attached to examples, and examples do not need to have (all) meta values that are "registered" in the corresponding domain
.
Why register meta attributes by the domain?
example[id]
, where id
needs to be an integer), values of registered meta attributes are also accessible through string or variable descriptor indices (example["age"]
).
For the latter two points - saving to a file and construction of new examples - there is an additional flag, added for several practical reasons: a meta attribute can be marked as "optional". Such meta attributes are not saved and not added to newly constructed examples. This functionality is used in, for instance, the above mentioned basket format, where new meta attributes are created while loading the file and we certainly don't want a new example to contain all words from the past examples.
There is another distinction between the optional and non-optional meta attributes: the latter are expected to be present in all examples of that domain. Saving to files, for one, expects them and will fail if a non-optional meta value is missing. Optional attributes may be missing. These rules are, however, mostly not strictly enforced, so adhering to them is rather up to your choice.
In general, register the meta attributes which are permanent and have a certain meaning. If you can't name it, you possibly don't want to register it. An animal name in the zoo data set or a patient ID in a typical medical data set is a good example for a non-optional registered attribute. Word counts in basket format are optional registered attributes. The temporary example weights in bagging or example weights in certain configurations of tree induction or rule learning should be left unregistered.
Since meta attributes do not have a great impact, they can be added and removed even after the domain is constructed and examples of that domain already exist. For instance, if data
contains the Monk 1 data set, we can add a new continuous attribute named "misses"
with the following code.
We shall first provide a few examples, detailed description of the methods follows later.
domain2.py (uses monk1.tab)
Note that nothing changed in the example. No attributes are added. (This is only natural; the domain descriptor has no idea about which objects refer to it.) As already told, registering meta attributes enables addressing by indexing, either by attribute name or by its descriptor. For instance, to set the attribute to 0 for all examples in the table, you could proceed with
domain2.py (uses monk1.tab)
An alternative is referring by name.
Both alternatives are more elegant than the one to which you would have to resort if the meta attribute was not make known to the domain:
Registering the meta attribute also enhances printouts. When example is printed, meta-values for registered attributes are shown as "name:value" pairs, while for unregistered you will get id's instead of names.
As you can learn by reading documentation on ExampleTable
, the best way to add a meta attribute to whole example table is by calling
This again works only if "misses" is registered in data.domain
(which we did by calling data.domain.addmeta(id, misses)
). If it's not, you should use an id instead of string "misses" and change 0 to 0.0 to prevent the value from being interpreted as a discrete value (if there is no descriptor, orange has not idea about what you mean by 0 - a discrete index or a continuous value):
In a massive testing of different models, you could count the number of times that each example was missclassified by calling classifiers in the following loop.
domain2.py (uses monk1.tab)
The other effect of registering meta attributes is that they appear in converted examples. That is, whenever an example is converted to certain domain, the example will have all the meta attributes that are declared in that domain. If the meta attributes occur in the original domain of the example or can be computed from the attributes in the original domain, they will have appropriate values. When not, their values will be DK.
Domain d2
is constructed to have only the attributes a
, b
, e
and the class attribute, while the other three attributes are added as meta attributes, among with a mysterious additional attribute X
.
After conversion, the three attributes are moved to meta attributes and the new attribute appears as unknown.
orange.newmetaid()
. (You can cheat by just giving any negative integer, say -42, when you would only like to quickly try something. You shouldn't do that in final code, since calling orange.newmetaid
assures that id's are unique. We did so here to ensure that the code's printout is always the same.) The descriptor should be an attribute descriptor derived from orange.Variable
, such as EnumVariable
, FloatVariable>
or StringVariable
.
The optional third argument tells whether the meta attribute is optional (when the value of the argument is non-zero) or not (when it is zero). Different non-zero values can be used for different kinds of non-optional attributes, if needed. These values are application dependent and Orange offers no corresponding registration facilities. If omitted, the attribute is not optional.
addmeta
described above, except that it can be used to add multiple meta attributes at once. The dictionary it accepts as an argument is in the same form as the one returned by getmetas()
. Therefore, to add all meta attributes from domain
to newdomain
, use the following statement.
True
if the meta attribute is optional and False
if it's not.To a certain extent, domains behave like lists. The length of domain is the number of its attributes, including the class attribute. Iterating through domain goes through attributes and the class attribute, but not through meta attributes. You can get slices, but cannot set them (since domains are immutable). Domains can be indexed by integer indices, attribute names or descriptors. Domain has a method index(var)
that returns the index of an attribute specified by a descriptor, name (or index, if you believe this makes sense).