|
CONTENTS | |||
|
|
1. THE APRIORI-T ALGORITHM |
The Apriori-T ARM algorithm uses an Apriori style mechanism to process an input set but using the T-tree data structure developed by Coenen, Leng and Goulbourne (see [1] and [2]).
The T-tree (Total Support Tree) is a compressed set enumeration tree structure. The T-tree differs from more standard set enumeration trees in that the nodes at the same level in any sub-branch are organised into 1-D arrays such that the array indexes represent column numbers. For this purpose it is more convenient to build a "reverse" version of the tree (Figure 1(a)), which permits direct indexing with attribute/column numbers. The T-tree offers two initial advantages over standard set enumeration trees:
The Apriori-T algorithm combines the classic Apriori ARM algorithm of Agrawal and Srikant (1994) with the T-tree data structure. As each level is processed, candidates are added as a new level of the T-tree, their support is counted, and those that do not reach the required threshold of support are subsequently pruned. When the algorithm terminates, the T-tree contains only the large itemsets. At each level, new candidate K itemsets are generated from identified large K-1 itemsets, using the downward closure property of itemsets, which in turn may necessitate the inspection of neighbouring branches in the T-tree to determine if a particular K-1 subset is supported. We refer to this process as X-checking. Note that X-checking adds a computational overhead; offset against the additional effort required to establish whether a candidate K itemset, all of whose K-1 itemsets may not necessarily be supported, is or is not a large itemset. In some cases it is more expedient to assume that those subsets of a candidate K itemset, that are contained in neighbouring branches of the T-tree, are supported than to carry out X-checking.
Figure 1: The T-tree (Total support tree) for the data set {{1,3,4},{2,4,5},{2,4,6}}. Note that, for ease of processing, items/attributes are enumerated commencing with 1
2. THE DOWNLOADABLE APRIORI-T DEMONSTRATOR |
We have developed a small Apriori-T demonstrator GUI that can be downloaded from this site. This comprises two class files:
The input data files used must comprise a set of records, one per line, such that each record consists of an ordered sequence of "space separated" integers (not "comma separated"). Note that the presence of an integer in a record indicates the presence of a boolean attribute in that column number for the record in question. It is assumed that columns are numbered from 1 to N inclusive. Example input file:
1 2 3 4 5 6 1 3 6 2 4 6 1 2 3 |
Table 1: Test data file (5 records, 6 columns)
Figure 2 shows an example session with the Apriori-T demonstrator using the data file given in Table 1 and a support threshold of 5 (5%). Note that three levels were generated in the T-tree (maximum value for K is 3)
Figure 2: Apriori T GUI example session using the data set presented in Table 1 and a support threshold of 5%
3. SOURCE CODE FOR DEMONSTRATOR |
For those that may be interested the Java source code for the demomstrator is avialable here:
This should be compiled in the usual manner:
javac AprioriTgui.java
REFERENCES |
Created and maintained by Frans Coenen. Last updated 21 May 2007