/* -------------------------------------------------------------------------- */ /* */ /* CPAR (CLASSIFICATION BASED ON PREDICTIVE ASSOCIATION RULES) CAR GENERATOR */ /* */ /* Frans Coenen */ /* */ /* Wednesday 11 February 2004 */ /* */ /* Department of Computer Science */ /* The University of Liverpool */ /* */ /* -------------------------------------------------------------------------- */ /* Class structure AssocRuleMining | +-- Classification | +-- PRM_CARgen | +-- CPAR_CARgen */ // Java packages import java.util.*; import java.io.*; /** Methods to produce classification rules using CPAR (Classification based on Predictive Association Rules) algorithm first proposed by Xiaoxin Yin and Jiawei Han. Assumes that input dataset is organised such that classifiers are at the end of each record. Note: number of classifiers value is stored in the numClasses field. @author Frans Coenen @version 11 February 2004 */ /* To compile: javaARMpackc.exe CPAR_CARgen.java */ public class CPAR_CARgen extends PRM_CARgen { /* ------ FIELDS ------ */ // Constants /** Gain similarity ratio */ private final double GAIN_SIMILARITY_RATIO=0.99; /* ------ CONSTRUCTORS ------ */ /** Constructor processes command line arguments. @param args the command line arguments (array of String instances). */ public CPAR_CARgen(String[] args) { super(args); } /* ------ METHODS ------ */ /* START CLASSIFICATION */ /** Starts classification rule generation process. @return The classification accuarcy (%). */ public double startClassification() { System.out.println("START CPAR CLASSIFICATION\n" + "------------------------"); // Set rule list to null. Note that startRuleList is defined in the // AssocRuleMining parent class and is also used to store Association // Rules (ARS) with respect to ARM. currentRlist.startRulelist = null; // Set DataArray and number of classes fields currentRlist.setDataArray(dataArray); currentRlist.setNumClasses(numClasses); // Check for classifier array if (classifiers==null) { System.out.println("ERROR: no classifiers array! To create a " + "classifiers array use createClassifiersArray(), " + "contained in ClassAprioriTserial class, called " + "from the application class."); System.exit(1); } // Start CPAR generation process startCPAR(); // Process rules processRules(); // Test classification using the test set and return accuracy. return(twoDecPlaces(testClassification())); } /* START CPAR CLASSIFICATION */ /** Commences CPAR process to generate CARs.

Proceeds as follows for each class: 1) Generate attribute array (for computational convenience only). 2) Generate P and N example data sets. 2) Calculate weighting threshold. 3) While current weighting for P is greater than threshold: a) Copy P, N and A to: get P', N' and A'. b) If first iteration copy PN to PN', else combine PN' positive weightings from previous iteration and original negative weightings to form new PN array. c) If not possible to generate further antecedent attributes break. d) Generate rule. */ private void startCPAR() { // Generate attribute array attributes = new double[numOneItemSets-numClasses+1][2]; for(int attIndex=0;attIndex totalWeightThreshold) { // Copy P, N, A and PN to P', N', A' and PN' positiveExamples2 = copyExamplesArray(positiveExamples); negativeExamples2 = copyExamplesArray(negativeExamples); pn_array2=copyDouble2Darray(pn_array); attributes2 = copyDouble2Darray(attributes); // It is possible, if no attributes with gain above the // specified minimum are discovered, that the current totalWeight // for P will never be reduced to below the threshold --- therefore // tesy for this situation. if (noValidGainsinPNarray()) break; // Proceed with generation process cparGeneration(null,consequent); } } } /* CPAR GENERATION */ /** Uses the CPAR algorithm to generate a CAR for the given class by examining gain for each available attribute. @param antecedent the rule antecedent sofar. @param consequent the given class. */ private void cparGeneration(short[] antecedent, short[] consequent) { // Determine weighting of previous example data sets. double oldPosWeight = (double) getWeightOfExamples(positiveExamples2); double oldNegWeight = (double) getWeightOfExamples(negativeExamples2); // Loop through attributes array and determine gain if added to rule, // gains placed in second row of attribute array. calculateGains(oldPosWeight,oldNegWeight); // Get index for attribute with best gain int indexOfBestAttribute = getBestGain(); // Inherited method double bestGain = attributes2[indexOfBestAttribute][1]; // if best gain less than user specified minimum stop. if (bestGain <= MIN_BEST_GAIN) { foundRule(antecedent,consequent); return; } // Proceed cparGeneration2(bestGain,antecedent,consequent); } /* CPAR GENERATION 2 */ /** Continues to process example data sets using CPAR algorithm.

loop through attribute array finding gains above the gain threshold and then processes the associated attribute (there mat be several such attributes or none). @param bestGain the highest gain value found on this iteration. @param antecedent the rule antecedent sofar. @param consequent the given class. */ private void cparGeneration2(double bestGain, short[] antecedent, short[] consequent) { // Determine gain threshold above which a gain value is considered // to be sufficiently high to be considered (provided that it is // also above the minimum user specified threshold). double gainThreshold = bestGain*GAIN_SIMILARITY_RATIO; if (gainThreshold < MIN_BEST_GAIN) gainThreshold=MIN_BEST_GAIN; //Loop through attributes array for (int index=1;indexgainThreshold) { ExamplesStruct[] tempPosExamples = copyExamplesArray(positiveExamples2); ExamplesStruct[] tempNegExamples = copyExamplesArray(negativeExamples2); double[][] temp_pn_array = copyDouble2Darray(pn_array2); double[][] temp_attributes = copyDouble2Darray(attributes2); cparGeneration3((short) index,antecedent,consequent); ExamplesStruct[] positiveExamples2 = copyExamplesArray(tempPosExamples); ExamplesStruct[] negativeExamples2 = copyExamplesArray(tempNegExamples); double[][] pn_array2 = copyDouble2Darray(temp_pn_array); double[][] attributes2 =copyDouble2Darray(temp_attributes); } } } /* CPAR GENERATION 3 */ /** Process the given attribute using CPAR algorithm.

Adds attribute to rule so far and repeats. @param attribute the attribute to be added to the current rule antecedent. @param antecedent the rule antecedent sofar. @param consequent the given class. */ private void cparGeneration3(short attribute, short[] antecedent, short[] consequent) { // Confirm by assigning to antecedent and indicating, in attributes // array, that selected attribute is no longer available antecedent = reallocInsert(antecedent,attribute); attributes2[attribute][0]=1.0; // Revise example datasets by removing examples that do not satisfy // the current rule antecedent positiveExamples2=removeExDoNotSatRule(0,antecedent,positiveExamples2); negativeExamples2=removeExDoNotSatRule(1,antecedent,negativeExamples2); // If the negative example rule set is empty the total weighting for // the set will be 0.0 and thus any gain calculated for any attribute // will also be 0.0 therefore in the event of an empty negative example // set we may as well stop here rather than recur and calculate a set // of gains which will not be above the threshold! if (negativeExamples2==null) { foundRule(antecedent,consequent); return; } // Repeat cparGeneration(antecedent,consequent); } /* ------------------------------------------------------------ */ /* */ /* OUTPUT */ /* */ /* ------------------------------------------------------------ */ /* OUTPUT SETTINGS */ /** Outputs command line values provided by user. (Overrides higher level method.) */ protected void outputSettings() { System.out.println("SETTINGS\n----------"); System.out.println("File name = " + fileName); System.out.println("Num. classes = " + numClasses); System.out.println("K value = " + K_VALUE); System.out.println("Min. best gain = " + MIN_BEST_GAIN); System.out.println("Total Weight Factor = " + TOTAL_WEIGHT_FACTOR); System.out.println("Decay factor = " + DECAY_FACROR); System.out.println("Gain similarity ratio = " + GAIN_SIMILARITY_RATIO); System.out.println(); } }