Objects derived from Distribution
are used throughout Orange to store various distributions. These often - but not necessarily - apply to distribution of values of certain attribute on some dataset. You will most often encounter two classes derived from Distribution
: DiscDistribution
stores discrete and ContDistribution
stores continuous distributions. To some extent, they both resemble dictionaries, with attribute values as keys and number of examples with particular value as elements.
Class
contains the common methods for different types of distributions. Even more, its constructor can be used to construct objects of type DiscDistribution
and ContDistribution
(class Distribution
itself is abstract, so no instances of that class can actually exist).
part of distributions.py (uses adult_sample.tab)
This simple script prints out distribution of attribute "workclass" on dataset "adult_sample". The resulting distribution is of type DiscDistribution
since the attribute is discrete. The printed numbers are counts of examples that have particular attribute value.
part of distributions.py (uses adult_sample.tab)
Enough introduction. Here are Distribution
's attributes and methods.
Attributes
abs
as long as the distribution is not normalized.random()
.Methods
DiscDistribution
or ContDistribution
, depending on the attribute type. If attribute
is the only argument, it must be an attribute descriptor (see Variable
). In that case, an empty distribution is constructed. If examples
are given as well, the attribute's distribution is computed, as seen in the above example. In that case, attribute
can also be given by name or its position in the domain. If examples are weighted, the id of meta-attribute with weights is passed as the third argument (default is 0, no weights).
If attribute is given by descriptor, it doesn't need to exist in the domain, but it must be computable from given examples. This way, it is possible to obtain distributions for attributes constructed by constructive induction or for discretized attributes, without translating the entire dataset. There's an example for this in documentation on attribute descriptors.
Value
, integers and symbolic names (if variable
is defined) can be used. For continuous elements, use Value
or continuous number (eg cont[3.14]
).
To get the number of examples with workclass
="private", you can use either of the three forms below:
Elements cannot be removed from distributions.
Length of distribution equals the number of possible values for discrete distributions (if variable
is set), the value with the highest index encountered (if distribution is discrete and variable
is not set) or the number of different values encountered (for continuous distributions).
weight
(default is 1.0) was added. value
can be orange.Value
, an index (for discrete distributions), continuous number (for continuous distributions) or symbolic value, if variable
is set.abs
), sets normalized
to true
and abs
to 1.0. Fields cases
and unknowns
are unchanged.keys()
), not any continuous value from the distribution's range.
This method uses distribution's randomGenerator
. If none has been constructed and/or assigned yet, one is constructed and stored for further use.
Discrete distributions can be constructed directly.
Methods
variable
and allocates a list of appropriate size for the distribution.None
. You can use such distribution for random number generation.
This will print out approximately ten 0's, six 1's and four 2's. To name the values, you can assign an attribute descriptor.
None
.Besides those constructors, there are no other specific operations for discrete distributions.
Continuous distribution (
) offers similar constructors as discrete distributions, except that instead of a list, it expects a dictionary, such as one returned by native
. There are some specific methods.
Methods
variable
.None
.p
percents of attribute's values are smaller than x. p
must be a value between 0 and 100. For instance, if dage
is a continuous distribution, quartiles can be printed by
x
. If value is not present, it is interpolated.Represents Gaussian distribution.
Attributes
abs
.
Methods
mean
and sigma
are 0.0 and 1.0 (normalized distribution), and abs
is set to 1.0.distribution
must be either ContDistribution
and its average
and deviation
will become mean
and sigma
for the new distribution, or GaussianDistribution
, which will be simply copied. abs
is set to distribution.abs
.mean
.sigma
sigma
2.x
(Gaussian function multiplied by abs
).Class distributions can be computed by calling orange.Distribution(data.domain.classVar, weightID)
(weightID
can be left out if examples are not weighted). Since this is a frequent operation a shortcut is provided.
orange.getClassDistribution(examples[, weightID])
computes distribution of class values for the given data set. Result is of type DiscDistribution
or ContDistribution
.
Orange can compute distributions for all objects in a single iteration over examples and store them in an object of type DomainDistributions
. Its constructor accepts examples and, optionally, an ID of meta attribute with weights. Resulting object is list-like, with the exception that not only integers but also attribute descriptors and names can be used for indexing.
The script below computes distributions for all attributes in the data and prints out distributions for discrete and averages for continuous attributes.
part of distributions.py (uses adult_sample.tab)
To get the distribution for, say, attribute "age", you can either use its index in the domain or its name or descriptor.