Example

orange.Example holds examples - a list of attribute values, together with some auxiliary data. That is how you see them in Python: as a list, resembling ordinary Python's list to extent possible. There are, however, differences; each example corresponds to some domain and therefore the number of attributes ("list elements") and their types are always as prescribed.

Attributes

domain (read-only)
Each example corresponds to a domain. This field is set at construction time and cannot be modified.
name
Example can be assigned a name. It is not used by Orange methods but is provided to be used in user interfaces.

Examples cannot be assigned arbitrary attributes like other Python and Orange objects ("attribute" is here meant in the sense "class attribute"). For instance, if ex is an example, ex.xxx=12 will yield an error.

Construction

To construct a new example, you first have to have a domain description. You can construct one yourself or load it from a file (which otherwise also contains some examples). For sake of simplicity, we shall load a domain from "lenses" dataset.

part of example.py (uses lenses.tab)

>>> import orange >>> data = orange.ExampleTable("lenses") >>> domain = data.domain >>> for attr in domain: ... print attr.name, attr.values ... age <young, pre-presbyopic, presbyopic> prescription <myope, hypermetrope> astigmatic <no, yes> tear_rate <reduced, normal> lenses <none, soft, hard>
orange.Example(domain)
This is a basic constructor that creates an example with all values unknown. Setting them is one of subjects of this page.
orange.Example(domain, list-of-values)
This construct an initialized example. The list can contain anything that can be converted to a value (see documentation on orange.Value) and must be of appropriate length, one value for each corresponding attribute. >>> ex = orange.Example(domain, ["young", "myope", "yes", "reduced", "soft"]) >>> print ex ['young', 'myope', 'yes', 'reduced', 'soft'] >>> ex = orange.Example(domain, ["young", 0, 1, orange.Value(domain[3],\ "reduced"), "soft"]) >>> print ex ['young', 'myope', 'yes', 'reduced', 'soft']

The first example was constructed by giving values as strings. That's what you'll usually do; continuous values can, naturally, be given as numbers (or as strings, if you desire so). In the second example, we've shown alternatives: the second and the third values are given by indices and for the fourth we have constructed an orange.Value (something that orange would do for us automatically anyway if we just passed a string).

orange.Example(example)
This is cloning: a new example is created which is exactly the same as the original.
orange.Example(domain, example)
This form of constructor can be used for converting examples from other domains. >>> reduced_dom = orange.Domain(["age", "lenses"], domain) >>> reduced_ex = orange.Example(reduced_dom, ex) >>> print reduced_ex ['young', 'soft']

If domain is the same as original example's domain, this constructor is equivalent to the previous one.

orange.Example(domain, list-of-examples)
Essentially similar to the converting constructor, orange.Example(domain, example), which fills the example with values obtained from another example, this constructor fills the example with values obtained from multiple examples. The needed values are sought for in ordinary and meta-attributes registered with the corresponding domains. Meta-attributes that appear in the given examples and don't appear in the new example either as ordinary or meta attributes, are copied as well.

We shall demonstrate the function on the datasets merge1.tab and merge2.tab; the first has attributes a1 and a2, and meta-attributes m1 and m2, while the second has attributes a1 and a3 and meta-attributes m1 and m3.

example_merge.py (uses merge1.tab, merge2.tab)

import orange data1 = orange.ExampleTable("merge1") data2 = orange.ExampleTable("merge2", use = data1.domain) a1, a2 = data1.domain.attributes m1, m2 = data1.domain.getmetas().items() a1, a3 = data2.domain.attributes m1i, m2i = data1.domain.metaid(m1), data1.domain.metaid(m2) a1, a3 = data2.domain.attributes n1 = orange.FloatVariable("n1") n2 = orange.FloatVariable("n2") newdomain = orange.Domain([a1, a3, m1, n1]) newdomain.addmeta(m2i, m2) newdomain.addmeta(orange.newmetaid(), a2) newdomain.addmeta(orange.newmetaid(), n2) merge = orange.Example(newdomain, [data1[0], data2[0]]) print "First example: ", data1[0] print "Second example: ", data2[0] print "Merge: ", merge

The newdomain consists of several attributes from data1 and data2: a1, a3 and m1 are ordinary, and m2 and a2 are meta-attributes. Variables m1 and m2 are really tuples of meta-id and a descriptor (Variable). For this reason, orange.Domain is initialized with m1[1], descriptor, while when adding meta attributes, we use m2[0] and m2[1], so that m2 has the same id in both domains. For meta-attribute a2 which was original ordinary, we obtain a new id.

In addition, newdomain has two new attributes, n1 and n2, the first as ordinary and the second as meta-attribute.

First example: [1, 2], {"m1":3, "m2":4} Second example: [1, 2.5], {"m1":3, "m3":4.5} Merge: [1, 2.5, 3, ?], {"a2":2, "m2":4, -5:4.50, "n2":?}

Since attributes a1 and m1 appear in domains of both original examples, the new examples can only be constructed if these values match. They indeed do, and the merged example has all the values defined in the domain (a1, a3 and m2, and meta-attributes a2 and m1). In addition, it got the value of the meta-attribute m3 from the second example, which is only identified by id -4 since it is not registered with the domain. Values of the two new attributes are left undefined.

Basic methods

Examples have certain list-like behaviour. You can address their values. You can use for loops to iterate through example's values (for value in example:.... You can query example's length; it equals the number of attributes, including class attribute. You however cannot change the "length" of example, by inserting or removing attributes. Number and types of attributes are defined by domain, and the domain cannot be changed once example is constructed. Finally, you can convert an example to an ordinary Python's list.

Examples can be indexed by integer indices, attribute descriptors or attribute names. Since "age" is the the first attribute in dataset lenses, the below statements are equivalent.

>>> age = data.domain["age"] >>> example = data[0] >>> print example[0] young >>> print example[age] young >>> print example["age"] young

Example's values can be modified. We shall increase the age (and if it becomes larger than 2, reset it to 0).

>>> example = data[0] >>> print data[0] ['young', 'myope', 'no', 'reduced', 'nono'] >>> example[age] = (int(example[age])+1) % 3 >>> print data[0] ['pre-presbyopic', 'hyper', 'y', 'normal', 'no']

The lesson which we've learned by the way is that by example = data[0] we don't get a fresh copy of example but a reference to the first example in the data. If you need a fresh copy, you need to clone the example, as explained above.

The last value in the example is class value. Do not access it by example[-1] since this is reserved for future use (with meta values); use getclass and setclass instead.

Methods

getclass()
Returns the example's class as Value.
setclass(value)
Sets the example's class. The argument can be a Value, number or string.
setvalue(value)
Argument value should be a qualified orange.Value, that is, it should have field variable defined. value.variable should be one of the attributes in example's domain (either ordinary or a registered meta-attribute). Functions sets the value of the attribute to the given value. This function is equivalent to calling self[value.variable] = value.

This function makes it easy to assign prescribed values to examples; see an example in the section about meta values.

native([nativity])
Converts the example into an ordinary Python list. If the optional argument is 1 (default), the list will contain objects of type Value; if it is 0, the list will contain native Python objects - string for discrete and numbers for continuous attribute values).
compatible(other_example, ignoreClass = 0)
Return true if the two examples are compatible, that is, if they are the same in all attributes which are known for both examples. If the optional second argument is true, class values are ignored, so two examples are compatible even though they differ in their class values.

Hashing

Hash function for example (accessible via Python's built-in function hash, see Python documentation) is computed using CRC32. To some extent, you can also use it as random number (this is done, for instance, by RandomClassifier.

Meta Values

Data examples in Orange are described by a fixed number and types of values, defined by domain descriptor. There is, however a way to attach additional attributes to examples. Such attributes (we call them meta-attributes) are not used for learning, but can carry additional information, such as, for instance, patient's name or the number of times the example was missclassified during some test procedure. The most common additional information is example's weight. To make things even more complex, we have already encountered problems for which examples had to have more than one weight each.

For contrast from ordinary attributes, examples from the same domain (or even the same ExampleTable) can have varying number of meta values. Ordinary attributes are addressed by positions (eg example[0] is the first and example[4] is the fifth value in the example). Meta-attributes are addressed by id's; id's are really negative integers, but you should see them as "keys". An example can have any number of meta values with distinct id's. Domain descriptor can, but doesn't need to know about them.

Id's are "created" by function orange.newmetaid(). (The function uses a very elaborate procedure for generating unique negative integers; the procedure might reveal itself only to the brightest if they make a few calls to the function and carefully observe the returned values.) So, if you want to assign meta values to examples, you need to obtain an id from orange.newmetaid(); afterwards, you can use it on any examples you want.

If there is a particular attributes associated with the meta value, you can also pass the attribute as an argument to orange.newmetaid. If the attribute has been already registered with some id, the id can be reused. Doing so is recommended, but not the necessary.

Meta values can also be loaded from files in tab-delimited or Excel format. In this case, you only need to know the names of corresponding meta-attributes; id's and stuff will be taken care of while loading the data. See documentation on file format.

Most often, you will use id for assigning weights; to each example you would assign a number (can be greater or smaller than one, most algorithms will even tolerate negative weights) and pass the id to the learning algorithm. Let's do this with random weights.

example2.py (uses lenses.tab)

>>> import orange, random >>> random.seed(0) >>> data = orange.ExampleTable("lenses") >>> id = orange.newmetaid() >>> for example in data: >>> example[id] = random.random() >>> print data[0] ['young', 'myope', 'no', 'reduced', 'none'], {-2:0.84}

Example now consists of two parts, ordinary attributes that resemble a list since they are addressed by positions (eg. the first value is "psby"), and meta values that are more like dictionaries, where the id (-2) is a key and 0.34 is a value (of type orange.Value, as all values in Example).

To make learner aware of weights, one only needs to pass the id as an additional argument. Therefore, to train a Bayesian classifier on our randomly weighted examples, you would call it by.

>>> bayes = orange.BayesLearner(data, id)

Many other functions accept weights in similar fashion.

>>> print orange.getClassDistribution(data) <15.000, 5.000, 4.000> >>> print orange.getClassDistribution(data, id) <9.691, 3.232, 1.969>

It is easy to see how this system also accommodates examples having different weights to be used for different procedures in the same experimental setup.

As mentioned in documentation on orange.Domain, you can enhance the output by registering an attribute descriptor for meta-attribute with id -2 in the example's domain.

>>> w = orange.FloatVariable("w") >>> data.domain.addmeta(id, w)

Meta-attribute can now be indexed just as any other attribute:

>>> print data[0][id] 0.844422 >>> print data[0][w] 0.844422 >>> print data[0]["w"] 0.844422

More important consequence of registering attribute with the domain is that it enables automatic value conversion.

Let us add a nominal meta-attribute, which will tell whether the example has been double-checked by the domain expert. The attributes values will be "yes" and "no".

part of example3.py (uses lenses.tab)

>>> ok = orange.EnumVariable("ok?", values=["no", "yes"]) >>> ok_id = orange.newmetaid() >>> data[0].setmeta(ok_id, "yes") Traceback (most recent call last): File "C:\PROGRA~1\python22\lib\site-packages\Pythonwin\pywin\framework\scriptutils.py", line 301, in RunScript exec codeObject in __main__.__dict__ File "D:\ai\OrangeTest\reference-misc\example3.py", line 23, in ? data[0].setmeta(ok_id, "yes") TypeError: cannot convert 'yes' to a value of an unknown attribute

This can't work since we haven't told Orange that ok_id corresponds to attribute ok and thus it cannot convert string "yes" to a orange.Value. You should perform the conversion manually.

>>> data[0][ok_id] = orange.Value(ok, "yes"))

However, if you register the meta-attribute with the domain descriptor, Orange can find a descriptor and perform the conversion itself.

>>> data.domain.addmeta(ok_id, ok) >>> data[0][ok_id] = "yes"

As before, you can use either id ok_id, attribute descriptor ok or attribute's name "ok?" to index the example.

>>> data[0][ok_id] = "yes" >>> data[0][ok] = "yes" >>> data[0]["ok?"] = "yes"

It is even possible to use the meta-attribute with the setvalue function.

>>> no_yes = [orange.Value(ok, "no"), orange.Value(ok, "yes")] >>> for example in data: ... example.setvalue(no_yes[whrandom.randint(0, 1)])

Methods

setmeta(value), getmeta()
Obsolete functions for setting and getting meta values.
getmetas([key-type]), getmetas(optional, [key-type])
Returns example's meta values as a dictionary. Key type can be int (default), str or orange.Variable, and determines whether the keys in the dictionary will be meta-id's, attribute names or attribute descriptors. In the latter two cases, the function will only return the meta values that are registered in the domain (there are no descriptors/names associated with other values). In either case, the dictionary contains only a copy of the values: changing the dictionary won't affect the example's meta values.

Argument 'optional' tells the method to return only the optional or the non-optional meta attributes. For the optional, the attributes with the same value of the flag are returned. If the argument is absent, both types of attributes are returned.

The below code will print out the dictionary with all four possible key-types.

part of basket.py (uses inquisition2.basket)

data = orange.ExampleTable("inquisition2") example = data[4] print example.getmetas() print example.getmetas(int) print example.getmetas(str) print example.getmetas(orange.Variable)
hasmeta(id | attribute-descriptor | name)
Returns True if the example has the meta attribute.
removemeta(id | attribute-descriptor | name)
Removes a meta value. To use an attribute-descriptor or name, the corresponding attribute must be registered.
getweight(id | attribute-descriptor | name | None)
Returns a value of specified meta attribute. Value must be continuous; an exception is raised otherwise. If the argument is zero or None, function returns 1.0 (since weight id of 0 normally means that examples are not weighted).

If you are writing your own learner, you should always use this function to retrieve example's weight. It is practical: most functions in Orange that can optionally accept weights, understand a weight id of 0 as "no weights"; this function takes care of that. In particular, never attempt to do this:

>>> weight = example[id]

If examples are not weighted, id will be zero and you'll get the value of the first attribute...

setweight((id | attribute-descriptor | name) [, weight])
Sets a weight. If id is zero or None, nothing happens. weight must be a number; if omitted, the weight is set to 1.0.
removeweight(id | attribute-descriptor | name)
A simplified equivalent for removemeta. It does exactly the same thing except that it doesn't accept anything but integer for id. If id is zero or None, this function does nothing.