Orange contains two simple classes for computing BasicAttrStat
holds the statistics for a single attribute and DomainBasicAttrStat
holds the statistics for all attributes in the domain.
Attributes
n
is the sum of weights of those examples.Methods
weight
is optional, default is 1.0.You most probably won't construct the class yourself, but instead call DomainBasicAttrStat
to compute statistics for all continuous attributes in the dataset.
Nevertheless, here's how the class works. Values are fed into add
; this is usually done by DomainBasicAttrStat
, but you can traverse the examples and feed the values in Python, if you want to. For each value it checks and, if necessary, adjusts min
and max
, adds the value to sum
and its square to sum2
. The weight is added to n
. If holdRecomputation
is false
, it also computes the average and the deviation. If true
, this gets postponed until recompute
is called. It makes sense to postpone recomputation when using the class from C++, while when using it from Python, the recomputation will take much much less time than the Python interpreter, so you can leave it on.
You can see that the statistics does not include the median or,
more generally, any quantiles. That's because it only collects
statistics that can be computed on the fly, without remembering the
data. If you need quantiles, you will need to construct a ContDistribution
.
DomainBasicAttrStat
behaves as a list of BasicAttrStat
except for a few details.
Methods
part of basicattrstat.py (uses iris.tab)
This will print
if a
" in the script? It's needed because of discrete attributes for which this statistics cannot be measured and are thus represented by a None
. Method purge
gets rid of them by removing the None
's from the list.
DomainBasicAttrStat
behaves like a ordinary list, except that its elements can also be indexed by attribute descriptors or attribute names.
If you need more statistics, see information on distributions.