## Overview

AssRuleX (Association Rule eXtractor) is the second most important module of the Coron platform. This module is responsible for the extraction of different sets of association rules. With AssRuleX one can extract the following association rules:

rule type | command-line option | where to read more about |
---|---|---|

all valid association rules | -rule:all |
Summary in [1]. |

closed association rules | -rule:closed |
Introduced in [1]. |

all informative association rules | -rule:all_inf |
Summary in [2]. |

reduced informative association rules | -rule:inf |
Summary in [2]. |

Generic Basis (GB) | -rule:GB |
Summary in [2]. |

(all) Informative Basis (IB) | -rule:all_IB |
Summary in [2]. |

reduced Informative Basis (IB) | -rule:IB |
Summary in [2]. |

rare informative association rules | -rule:rare |
Introduced in [1]. |

Note that under "all informative association rules" we mean the minimal non-redundant association rules (MNR); under "reduced informative association rules" we mean the transitive reduction of MNR (i.e. RMNR); and under "rare informative association rules" we mean the exact MRG rules.^{1}

## Command-line usage

`./core02_assrulex.sh [switches] <database> <min_supp> <min_conf> -alg:<alg> -rule:<rule>`

There are five compulsory parameters:

- database file (in .basenum, .bool, or .rcf format)
- minimum support
- minimum confidence
- name of the algorithm to be used
- rule set that you want to extract with the previously specified algorithm

The minimum support can be given in either absolute or relative value, e.g. 2 or 40%.

The minimum confidence can be given as a real value (between 0 and 1.0, e.g. 0.5), or as a percentage (between 0% and 100%, e.g. 50%).

The following algorithm/association rules combinations can be used:

```
Apriori:
1) all association rules -rule:all
Close:
1) closed association rules -rule:closed
Pascal:
1) all association rules -rule:all
Pascal+:
1) all association rules -rule:all
2) closed association rules -rule:closed
Charm (v4) [triangular matrix; hash for FCIs]:
1) closed association rules -rule:closed
Eclat (v1) [triangular matrix]:
1) all association rules -rule:all
Zart (v2) [triangular matrix]:
1) all association rules -rule:all
2) closed association rules -rule:closed
3) all informative association rules -rule:all_inf
4) reduced informative association rules -rule:inf
5) Generic Basis (GB) -rule:GB
6) (all) Informative Basis (IB) -rule:all_IB
7) reduced Informative Basis (IB) -rule:IB
Eclat-Z [save to file; process file; output: like Zart]:
1) all association rules -rule:all
2) closed association rules -rule:closed
3) all informative association rules -rule:all_inf
4) reduced informative association rules -rule:inf
5) Generic Basis (GB) -rule:GB
6) (all) Informative Basis (IB) -rule:all_IB
7) reduced Informative Basis (IB) -rule:IB
BtB [rare eq. classes]:
1) rare association rules -rule:rare
Pseudo-Closed [FPCIs + their closures]:
1) Duquennes-Guigues Basis (DG) -rule:DG
dEclat [like Eclat (v1), with diffsets]:
1) all association rules -rule:all
dCharm [like Charm (v4), with diffsets]:
1) closed association rules -rule:closed
Touch (v1) [Charm + Talky-G + association]:
1) all informative association rules -rule:all_inf
2) reduced informative association rules -rule:inf
3) Generic Basis (GB) -rule:GB
4) (all) Informative Basis (IB) -rule:all_IB
5) reduced Informative Basis (IB) -rule:IB
```

## Example

`./core02_assrulex.sh sample/laszlo.rcf 4 50% -names -alg:zart -rule:inf`

Result:

```
# Database file name: /home/jabba/eclipse2/releases/coron-v1-20090917/sample/laszlo.rcf
# Database file size: 208 bytes
# Number of lines: 5
# Total number of attributes: 5
# Number of non empty attributes: 5
# Number of attributes in average: 3.4
# Density: 68%
# min_supp: 4, i.e. 80%
# min_conf: 50%
# Chosen algorithm: Zart (v2) [triangular matrix]
# Rules to extract: reduced informative association rules
{b} => {e} (supp=4 [80.00%]; conf=1.000 [100.00%]; suppL=4 [80.00%]; suppR=4 [80.00%]; class=FF) +
{e} => {b} (supp=4 [80.00%]; conf=1.000 [100.00%]; suppL=4 [80.00%]; suppR=4 [80.00%]; class=FF) +
# Number of found rules: 2
# Number of FF rules: 2
```

At the beginning and at the end there are some statistics about the dataset and the number of found rules.

The `-names` option is highly recommended. It works only for .rcf files. With this option, attribute numbers are replaced with their names.

Let us see what a rule looks like:

`{b} => {e} (supp=4 [80.00%]; conf=1.000 [100.00%]; suppL=4 [80.00%]; suppR=4 [80.00%]; class=FF) +`

This means: the antecedent is {b}, the consequent is {e}. The support of the rule is 4, which is equivalent to 80% in this dataset (see the sample dataset). Confidence: 100%. Support of the left part of the rule: 4; support of the right part of the rule: 4. The rule is in the FF class, i.e. both sides of the rule are frequent (frequent itemset implies frequent itemset). The rule is closed, i.e. the union of the left and right side forms a closed itemset.

There are some other quality measures available for the rules. They can be visualized with the `-full` or `-measures` switch.

` ./core02_assrulex.sh sample/laszlo.rcf 4 50% -names -alg:zart -rule:inf -full`

Example:

```
{b} => {e} (supp=4 [80.00%]; conf=1.000 [100.00%]; suppL=4 [80.00%]; suppR=4 [80.00%]; lift=1.250;
conv=NOT_DEF; dep=0.200; nov=0.160; sat=1.000; prr=NOT_DEF; por=NOT_DEF; chi2=0.703; class=FF) +
```

This means:

- left part of the rule ({b})
- right part of the rule ({e})
- support of the rule (4, i.e. 80%)
- confidence of the rule (1.0, i.e. 100%)
- support of the left part of the rule (4, i.e. 80%)
- support of the right part of the rule (4, i.e. 80%)
- lift (1.250)
- conviction (not defined in the case of exact association rules)
- dependency (0.200)
- novelty (0.160)
- satisfaction (1.000)
- …
- classification of the rule (type FF, i.e. frequent itemset implies frequent itemset)
- is it a closed rule? (in the example the rule is closed)

Notes: in some cases a statistical measure cannot be calculated for a rule. In this case "NOT_DEF" is displayed. The '+' at the end means that the rule is closed, i.e. the union of the antecedent and consequent forms a closed itemset.

With the `-examples` switch one can visualize the positive and negative examples of each rule. Positive example: objects that contain left and right sides of the rule. Negative example: objects that contain the left, but not the right side of the rule.

`./core02_assrulex.sh sample/laszlo.rcf 2 50% -names -alg:zart -rule:inf -examples`

Sample output:

```
{a} => {b, e} (supp=3 [60.00%]; conf=0.750 [75.00%]; suppL=4 [80.00%]; suppR=4 [80.00%]; class=FF) +
Positive examples (objects that contain left AND right side of the rule):
[o1, o3, o5]
Negative examples (objects that contain left, BUT NOT the right side of the rule):
[o2]
```

Warning! This switch does not work with every algorithm. We only tested it with *Zart*.