Prev: Regression, Next: Other Techniques for Orange Scripting, Up: On Tutorial 'Orange for Beginners'
Association rules are fun to do in Orange. One reason for this is Python, and particular implementation that allows a list of association rules to behave just like any list in Python. That is, you can select parts of the list, you can remove rules, even add them (yes, append() works on Orange association rules!).
But let's start with the basics. For association rules, Orange straightforwardly implements APRIORI algorithm (see Agrawal et al., Fast discovery of association rules, a chapter in Advances in knowledge discovery and data mining, 1996), Orange includes an optimized version of the algorithm that works on tabular data). For number of reasons (but mostly for convenience) association rules should be constructed and managed through the interface provided by orngAssoc module. As implemented in Orange, association rules construction procedure does not handle continuous attributes, so make sure that your data is categorized. Also, class variables are treated just like attributes. For examples in this tutorial, we will use data from the data set imports-85.tab, which surveys different types of cars and lists their characteristics. We will use only first ten attributes from this data set and categorize them so three equally populated intervals will be created for each continuous variable. This will be done through the following part of the code:
Now, to our examples. First one uses the data set constructed with above script and shows how to build a list of association rules which will have support of at least 0.4. Next, we select a subset of first five rules, print them out, delete first three rules and repeat the printout. The script that does this is:
part of assoc1.py (uses imports-85.tab)
Notice that the when printing out the rules, user can specify which
rule evaluation measures are to be printed. Choose anything from
['support', 'confidence', 'lift', 'leverage', 'strength',
'coverage']
.
The second example uses the same data set, but first prints out
five most confident rules. Then, it shows a rather advanced type of
filtering: every rule has parameters that record its support,
confidence, etc... These may be used when constructing your own filter
functions. The one in our example uses support
and
lift
. [If you have just started with Python: lambda is a
compact way to specify a simple function without using def
statement. As a function, it uses its own name space, so minimal lift
and support requested in our example should be passed as function
arguments]. Here goes the code:
part of assoc2.py (uses imports-85.tab)
Just one rule with requested support and lift is found in our rule set:
Finally, for our third example, we introduce cloning. Cloning helps if you require to work with different rule subsets that stem from common rule set created from some data (actually, cloning is quite useless in our example, but may be very useful otherwise). So, we use cloning to make a copy of the set of rules, then sort by first support and then confidence, and then print out few best rules. We have also lower required minimal support, just to see how many rules we obtain in this way.
part of assoc3.py (uses imports-85.tab)
The output of this script is:
Prev: Regression, Next: Other Techniques for Orange Scripting, Up: On Tutorial 'Orange for Beginners'