/* -------------------------------------------------------------------------- */ /* */ /* APRIORI-TFP CLASSIFICATION RULE GENERATION */ /* */ /* Frans Coenen */ /* */ /* Tuesday 29 April 2003 */ /* (Revised Tuesday 21/10/003, 8/1/2004, 12/10/2006) */ /* */ /* Department of Computer Science */ /* The University of Liverpool */ /* */ /* -------------------------------------------------------------------------- */ /* Class structure AssocRuleMining | +-- TotalSupportTree | +-- PartialSupportTree | +-- AprioriTFPclass | +-- AprioriTFP_CRgen */ //package lucsKDD_ARM; // Java packages import java.util.*; import java.io.*; // Java GUI packages import javax.swing.*; /** Methods to produce classification rules using a APRIORI-TFP approach with either best first or K best first amtching, CSA (Confidence, Support and Antecedent size) ordering, and T-tree X-checking. Alternative ordering and matching strategies are defined in sub-classes of this class. Code assumes that input dataset is organised such that classifiers are at the end of each record. T-tree is constructed in such a way that on then first itteration all 1-item sets and all 2-item sets for the class branches are generted (in a sense the T-tree is "stepped"). On following itterations level N antecedent (non-class) branches are processed and N+1 class branches. Note: number of classifiers value is stored in the numClasses field. Code supports two approaches: 1) AprioriTFP-CR with 50:50 split of training-test set data. 2) AprioriTFP-CR with TCV. @author Frans Coenen @version 12 October 2006 */ public class AprioriTFP_CRgen extends AprioriTFPclass { /* ------ FIELDS ------ */ // None /* ------ CONSTRUCTORS ------ */ /** Constructor with command line arguments to be process. @param args the command line arguments (array of String instances). */ public AprioriTFP_CRgen(String[] args) { super(args); } /** Constructor with argument from existing instance of class AssocRuleMining. @param armInstance the given instance of the AssocRuleMining class. */ public AprioriTFP_CRgen(AssocRuleMining armInstance) { super(armInstance); } /** Default constructor. */ public AprioriTFP_CRgen() { } /* ------ METHODS ------ */ /*----------------------------------------------------------------------- */ /* */ /* START CLASSIFICATION BEST FIRST MATCHING */ /* */ /*----------------------------------------------------------------------- */ /* START CLASSIFICATION */ /** Starts classification rule generation process using Apriori-TFP, by building a P-tree. @return The classification accuarcay (%). */ public double startClassification() { String s = "START CLASSIFICATION (BEST FIRST), TFPC WITH X-CHEKING " + "AND CSA ORDERING\n-----------------------" + "----------------------------------------\n"; // proceed return(startClassification(s)); } /** Starts classification rule generation process using, Apriori-TFP, by building a P-tree (GUI version). @param tArea the text area to output data to. @return The classification accuarcay (%). */ public double startClassification(JTextArea tArea) { // Set text area textArea = tArea; // proceed return(startClassification()); } /** Starts classification rule generation process using, Apriori-TFP, by building a P-tree (version with input string argument). @param s String to be output to GUI/Command line interface. @return The classification accuarcay (%). */ protected double startClassification(String s) { if (textArea==null) { System.out.print(s); outputLimits(); } else { textArea.append(s); outputLimits(textArea); } // Create P-tree if (textArea==null) createPtree(); else createPtree(textArea); // Generate CRs using Apriori-TFP return(startClassification2()); } /** Continues classification rule generation proces using Apriori-TFP causing T-tree to be generated form P-tree, and ivoking classification testing.

Method called frequently during hill climbing process. @return The classification accuarcay (%). */ protected double startClassification2() { // Generate T-tree, and generate CRs. startClassification3(); // Output generated rule set if requested if (outputRuleSetToFileFlag) outputRulesToFile(); // Test classification using the test set. return(twoDecPlaces(testClassification())); } /*----------------------------------------------------------------------- */ /* */ /* START CLASSIFICATION BEST K MATCHING */ /* */ /*----------------------------------------------------------------------- */ /* START CLASSIFICATION BEST K */ /** Starts classification rule generation process using (with "best K classification"), Apriori-TFP, by building a P-tree. @return The classification accuarcay (%). */ public double startClassificationBestK() { String s = "START CLASSIFICATION (BEST K), TFPC WITH X-CHEKING " + "AND CSA ORDERING\n-----------------------" + "----------------------------------------\n" + "Best K value = " + kValue + "\n"; // Proceed return(startClassificationBestK(s)); } /** Starts classification rule generation process using (with "best K classification"), Apriori-TFP, by building a P-tree (GUI version). @param tArea the text area to output data to. @return The classification accuarcay (%). */ public double startClassificationBestK(JTextArea tArea) { textArea = tArea; // Proceed return(startClassificationBestK()); } /** Starts classification rule generation process using (with "best K classification"), Apriori-TFP, by building a P-tree (version with input string argument). @param s String to be outpurt to GUI/Command line interface.). @return The classification accuarcay (%). */ protected double startClassificationBestK(String s) { if (textArea==null) { System.out.print(s); outputLimits(); } else { textArea.append(s); outputLimits(textArea); } // Create P-tree if (textArea==null) createPtree(); else createPtree(textArea); // Generate CRs using Apriori-TFP return(startClassificationBestK2()); } /** Continues classification rule generation proces (with "best K classification") using Apriori-TFP.

Method called frequently during hill climbing process. @return The classification accuarcay (%). */ protected double startClassificationBestK2() { // Generate T-tree, and generate CRs. startClassification3(); // Test classification using the test set. return(twoDecPlaces(testClassificationBestK())); } /*----------------------------------------------------------------------- */ /* */ /* START CLASSIFICATION BEST FIRST WITH TCV */ /* */ /*----------------------------------------------------------------------- */ /* COMMEMCE TEN CROSS VALIDATION */ /** Start Ten Cross Validation (TCV) process using Apriori-TFP.

Assumes that data has been spilt into 10 equal portions. @return overall accuracy (%). */ public double commenceTCV() { double[] parameters = new double[10]; // Loop through tenths data sets for (int index=0;index<10;index++) { // Create training and test sets createTrainingAndTestDataSets(index); // Mine data, produce T-tree and generate CRs parameters[index] = startClassification(); } // Determine overal accuracy double totalAccu = 0; for (int index=0;indexJTextArea class. */ public void commenceTCVwithOutput(JTextArea tArea) { textArea = tArea; // Proceed commenceTCVwithOutput(); } /*----------------------------------------------------------------------- */ /* */ /* TFPC METHODS */ /* */ /*----------------------------------------------------------------------- */ /** Continues classification rule generation proces using Apriori-TFP by building T-tree. */ protected void startClassification3() { // Calculate minimum support threshold in terms of number of // records in the training set. minSupport = numRowsInTrainingSet*support/100.0; String s = "Support = " + twoDecPlaces(support) + ", Confidence = " + twoDecPlaces(confidence) + "\nMinimum support = " + twoDecPlaces(minSupport) + " (Records)\n"; if (textArea==null) { System.out.print(s); outputLimits(); } else { textArea.append(s); outputLimits(textArea); } // Set rule list to null. Note that startRuleList isdefined in the // AssocRuleMining parent class and is also used tostore Association // Rules (ARS) with respect ARM. startRulelist = null; // Generate T-tree (top level method contained in TotalSupportTree // parent class), and generate CRs. if (textArea==null) createTotalSupportTree(); else createTotalSupportTree(textArea); // Identify default rule. Number rules first as numbering is used to // identify default rule. numberRulesInBinTree(); findDefaultRule(); // Output rule list if (ruleListAttNumOutputFlag) { if (textArea==null) outputRules(); else outputRules(textArea); } if (ruleListSchemaOutputFlag) { if (textArea==null) outputRulesSchema(); else outputRulesSchema(textArea); } } /*----------------------------------------------------------------------- */ /* */ /* T-TREE BUILDING METHODS */ /* */ /*----------------------------------------------------------------------- */ /* CONTINUE CREATION OF T-TREE. */ /** Continues process of generating Total support tree.

Overides method in PartialSupportTree class, distinction is that this method includes test if it is possible to generate more CRs after the genertion of the first level. Remeber that createTtreeTopLevel creates the second level of the class branches of the T-tree as well as the entire top level so we may already have generated some rules aftyer the first pass of the P-tree. */ protected void contCreateTtree() { // Create Top level of T-tree (First pass of dataset). Defined in // in TotlaSupportTree class. createTtreeTopLevel(); // Generate level 2 in T-tree if more CRs can be generated int currentLevel = 2; if (!checkIfNoMoreCRsPossible(currentLevel)) { generateLevel2(); createTtreeLevelN(); } } /* ------------------------------------ */ /* CREATE TOP LEVEL OF T-TREE */ /* ------------------------------------ */ /* CREATE T-TREE TOP LEVEL */ /** Generates level 1 (top) of the T-tree.

Overides method in TotalSupportTree class. Distinction is that this method creates not only the top level for all attributes but also the second level for the class attribute branches. */ protected void createTtreeTopLevel() { // Dimension and initialise top level of T-tree startTtreeRef = new TtreeNode[numOneItemSets+1]; for (int index=1;index<=numOneItemSets;index++) startTtreeRef[index] = new TtreeNode(); // Dimension and initialise second level of T-tree for class attribute // branches for (int classIndex = numOneItemSets-numClasses+1; classIndex<=numOneItemSets;classIndex++) { startTtreeRef[classIndex].childRef = new TtreeNode[classIndex]; for (int index=1;index<=classIndex-1;index++) { startTtreeRef[classIndex].childRef[index] = new TtreeNode(); } } // Add support for each 1 itemset and 2-itemsets in class attribute createTtreeTopLevel2(); // Prune top level, setting any unsupported 1-itemsets to null and // level 2 class attribute branches pruneTopLevel(); // Generate Classification Rules (CRs) int currentLevel = 2; generateCRs(currentLevel); } /* GENERATE T-TREE TOP LEVEL 2 */ /** Commences process to generate top level (singletons) of Ttree by looping through table level by level (row by row)

Distinction between this method and the method which it overides in PartialsupportTree class is that this method does not destroy any part of the P-tree table as this may be required by further calls, e.g. when using a hill climbing approach. */ protected void createTtreeTopLevel2() { numLevelsInTtree = 2; // Step through Ptree table for(int index=1;indexOverides method in PartialSupportTree class, distinction is that this method passess an extra argument (pTreeItemSet) to the createTtreeTopLevel4 which also overides a correspomding PartialSupportTree class method. @param pTreeTableLevel the given level (row) of P-tree table records. */ protected void createTtreeTopLevel3(PtreeRecord[] pTreeTableLevel) { // Loop through level in P-tree table level for(int index=0;index Overides method in PartialSupportTree class, distinction is that this method adds support for the second level of the class attribute branches as well as the entire toplevel @param pTreeNodeLabel the label associated with a P-tree node (not the union of its parent labels). @param pTreeItemSet the Union of the pTreeNodeLabel and all parent node itemSets of the current node. @param pTreeNodeSupport the associated support value.*/ private void createTtreeTopLevel4(short[] pTreeNodeLabel, short[] pTreeItemSet, int pTreeNodeSupport) { // Loop through node label and add support for each 1 itemset for (int index=0;index Operates by prunning antecedent branches at the given level, and class branches at the level+1. @param level the level marker, set to the required level at the start and then decremented by 1 on each recursion (must be at least level 2, i.e not top level as method will not work for to level). @return true if all nodes considered have been pruned (in which case the T-tree generation processing can be stopped), false otherwise. */ private boolean pruneLevelN(int level) { int clsInd = numOneItemSets-numClasses+1; // Proccess Attribute branches boolean attBranchesPruned = true; for (int index=level;index Overides TotalSupportTree method, the distinctuion is that this method produces level 2 for the none class attribute branches and level 3 for the class attributes. */ protected void generateLevel2() { // Set next level flag nextLevelExists=false; // loop through top level (start at index 2 because cannot generate a // level from index 1 as there will be no proceeding attributes, // remember index 0 is unused. int classIndex = numOneItemSets-numClasses+1; for (int index=2;indexFollows an add support, prune, generate loop until there are no more levels to generate. Overides method in TotalSupportTree class, distinctiion between the two methods is that: (i) this version generates CRs as it proceeds, and (ii) stops the generation process once a certain number of frequent sets have been generated. */ protected void createTtreeLevelN() { int currentLevel=2; // Level in main part T-tree, not class branches // Loop while a further level exists while (nextLevelExists) { // Add support addSupportToTtreeLevelN(currentLevel); // Prune unsupported candidate sets (version for classification // rule generation pruneLevelN(currentLevel); // Generate Classification Rules (CRs) from nodes in class branches // in T-tree, hence "currentLevel+1" generateCRs(currentLevel+1); // Check if no more CRs can be generated if (checkIfNoMoreCRsPossible(currentLevel+1)) { break; } // Check number of frequent sets generated so far if (numFrequentSets>MAX_NUM_FREQUENT_SETS) { String s ="Level = " + currentLevel + ", Number of frequent " + "sets (" + numFrequentSets + ") generted so far " + "exceeds limit of " + MAX_NUM_FREQUENT_SETS + ", generation process stopped!\n"; if (textArea==null) System.out.println(s); else textArea.append(s); break; } // Attempt to generate next level nextLevelExists=false; generateLevelN(currentLevel); currentLevel++; // Check antecedent size if (currentLevel > MAX_SIZE_OF_ANTECEDENT) { String s = "Current CR antecedent size (" + currentLevel + ") exceeds limit of " + MAX_SIZE_OF_ANTECEDENT + ", generation process stopped!\n"; if (textArea==null) System.out.println(s); else textArea.append(s); break; } } // End numLevelsInTtree = currentLevel; } /* GENERATE LEVEL N ATTRIBUTES*/ /** Commences process of generating remaining levels in the T-tree (other than levels 1 and 2 for attributes and level 1, 2 and 3 for class attributes), both attribute and class attribute branches. @param level the level marker, set to the required level at the start and then decremented by 1 on each recursion. */ private void generateLevelN(int level) { short[] itemSet = new short[1]; int classIndex = numOneItemSets-numClasses+1; // Proccess Attribute branches for (int index=level+1;indexOverides PartialSupportTree method, distinction is that this method updates attribute braches for level N and also N+1 level in Class attribute branches. @param level the (start) current level. */ protected void addSupportToTtreeLevelN(int level) { int classIndex = numOneItemSets-numClasses+1; // Nested loop to step through P-tree table for (int index1=level;index1Called during level by level construction of T-tree. @param level the desired levle in the T-tree. */ private void generateCRs(int level) { // Loop through classifiers for (int index = numOneItemSets-numClasses+1;index<=numOneItemSets; index++) { // Initialise rule consequent itemset with class label (index) short[] consequent = new short[1]; consequent[0] = (short) index; // Class may be included in so few records that it is unsupported // for the given support threshold, or may have no child nodes if (startTtreeRef[index]!=null && startTtreeRef[index].childRef!=null) generateCRs(level-1,null,consequent,index, startTtreeRef[index].childRef); } // Prune T-tree nodes that no longer required as antecedents // to classification rules. pruneTtreePostCRgen(level); } /* GENERATE CLASSIFICATION RULES */ /** Generates classification rules for a particular level in the T-tree (i.e. of a particular cardinality/size) and for a particular classifier, and places them in an order linked-list of such rules.

Identical rule in AprioriT_CRgen class. @param level the desired level in the T-tree. @param itemSetSofar the label for a T-treenode as generated sofar (set to `null' at start). @param consequent the class for the rules to be generated. @param size the length/size of the current array lavel in the T-tree. @param linkRef the reference to the current array level in the T-tree. */ private void generateCRs(int level, short[] itemSetSofar, short[] consequent, int size, TtreeNode[] linkRef) { // At right level if (level==1) generateCRsRightLevel(itemSetSofar,consequent,size, linkRef); // Wrong level else generateCRsWrongLevel(level,itemSetSofar,consequent,size, linkRef); } /* GENERATE CLASSIFICATION RULES LINKED LIST (RIGHT LEVEL) */ /** Generates classification rules for the give array of T-tree nodes.

Identical rule in AprioriT_CRgen class. @param itemSetSofar the label for a T-treenode as generated sofar (set to `null' at start). @param consequent the class for the rules to be generated. @param size the length/size of the current array lavel in the T-tree. @param linkRef the reference to the current array level in the T-tree. */ protected void generateCRsRightLevel(short[] itemSetSofar, short[] consequent, int size, TtreeNode[] linkRef) { // Step through level and set to null where CR found above minimum // confidence threshold for (int index=1;indexinsertRuleintoRulelist found in // AssocRuleMining super class, (2) set reference // to null so that no further rule are generated for this // stubb. if (confidenceCR >= confidence) { // To insert rule according to confidence use // insertRuleintoRulelist, to insert according to size // of antecedent use insertRuleintoRulelist2. insertRuleIntoRulelist(tempItemSet,consequent, confidenceCR,linkRef[index].support); linkRef[index] = null; } } } } /* GENERATE CLASSIFICATION RULES LINKED LIST (WRONG LEVEL) */ /** Continue process of generating classification rules, p[roveed down child branches of T-tree.

Identical rule in AprioriT_CRgen class. @param level the desired level in the T-tree. @param itemSetSofar the label for a T-treenode as generated sofar (set to 'null' at start). @param consequent the class for the rules to be generated. @param size the length/size of the current array lavel in the T-tree. @param linkRef the reference to the current array level in the T-tree. */ private void generateCRsWrongLevel(int level, short[] itemSetSofar, short[] consequent, int size, TtreeNode[] linkRef) { // Step through array of T-tree nodes for (int index=1;indexUsed for diagnostics only. @param parameters the parameters arry to be output. */ public void outputParametersArray(double[] parameters) { for (int index=0;index