/* -------------------------------------------------------------------------- */ /* */ /* CPAR (CLASSIFICATION BASED ON PREDICTIVE ASSOCIATION RULES) CAR GENERATOR */ /* */ /* Frans Coenen */ /* */ /* Wednesday 11 February 2004 */ /* */ /* Department of Computer Science */ /* The University of Liverpool */ /* */ /* -------------------------------------------------------------------------- */ /* Class structure AssocRuleMining | +-- Classification | +-- PRM_CARgen | +-- CPAR_CARgen */ // Java packages import java.util.*; import java.io.*; /** Methods to produce classification rules using CPAR (Classification based on Predictive Association Rules) algorithm first proposed by Xiaoxin Yin and Jiawei Han. Assumes that input dataset is organised such that classifiers are at the end of each record. Note: number of classifiers value is stored in the numClasses field. @author Frans Coenen @version 11 February 2004 */ /* To compile: javaARMpackc.exe CPAR_CARgen.java */ public class CPAR_CARgen extends PRM_CARgen { /* ------ FIELDS ------ */ // Constants /** Gain similarity ratio */ private final double GAIN_SIMILARITY_RATIO=0.99; /* ------ CONSTRUCTORS ------ */ /** Constructor processes command line arguments. @param args the command line arguments (array of String instances). */ public CPAR_CARgen(String[] args) { super(args); } /* ------ METHODS ------ */ /* START CLASSIFICATION */ /** Starts classification rule generation process. @return The classification accuarcy (%). */ public double startClassification() { System.out.println("START CPAR CLASSIFICATION\n" + "------------------------"); // Set rule list to null. Note that startRuleList is defined in the // AssocRuleMining parent class and is also used to store Association // Rules (ARS) with respect to ARM. currentRlist.startRulelist = null; // Set DataArray and number of classes fields currentRlist.setDataArray(dataArray); currentRlist.setNumClasses(numClasses); // Check for classifier array if (classifiers==null) { System.out.println("ERROR: no classifiers array! To create a " + "classifiers array use createClassifiersArray(), " + "contained in ClassAprioriTserial class, called " + "from the application class."); System.exit(1); } // Start CPAR generation process startCPAR(); // Process rules processRules(); // Test classification using the test set and return accuracy. return(twoDecPlaces(testClassification())); } /* START CPAR CLASSIFICATION */ /** Commences CPAR process to generate CARs.
Proceeds as follows for
each class:
1) Generate attribute array (for computational convenience only).
2) Generate P and N example data sets.
2) Calculate weighting threshold.
3) While current weighting for P is greater than threshold:
a) Copy P, N and A to: get P', N' and A'.
b) If first iteration copy PN to PN', else combine PN' positive weightings
from previous iteration and original negative weightings to form
new PN array.
c) If not possible to generate further antecedent attributes break.
d) Generate rule. */
private void startCPAR() {
// Generate attribute array
attributes = new double[numOneItemSets-numClasses+1][2];
for(int attIndex=0;attIndex loop
through attribute array finding gains above the gain threshold and then
processes the associated attribute (there mat be several such attributes
or none).
@param bestGain the highest gain value found on this iteration.
@param antecedent the rule antecedent sofar.
@param consequent the given class. */
private void cparGeneration2(double bestGain, short[] antecedent,
short[] consequent) {
// Determine gain threshold above which a gain value is considered
// to be sufficiently high to be considered (provided that it is
// also above the minimum user specified threshold).
double gainThreshold = bestGain*GAIN_SIMILARITY_RATIO;
if (gainThreshold < MIN_BEST_GAIN) gainThreshold=MIN_BEST_GAIN;
//Loop through attributes array
for (int index=1;index Adds attribute to
rule so far and repeats.
@param attribute the attribute to be added to the current rule antecedent.
@param antecedent the rule antecedent sofar.
@param consequent the given class. */
private void cparGeneration3(short attribute, short[] antecedent,
short[] consequent) {
// Confirm by assigning to antecedent and indicating, in attributes
// array, that selected attribute is no longer available
antecedent = reallocInsert(antecedent,attribute);
attributes2[attribute][0]=1.0;
// Revise example datasets by removing examples that do not satisfy
// the current rule antecedent
positiveExamples2=removeExDoNotSatRule(0,antecedent,positiveExamples2);
negativeExamples2=removeExDoNotSatRule(1,antecedent,negativeExamples2);
// If the negative example rule set is empty the total weighting for
// the set will be 0.0 and thus any gain calculated for any attribute
// will also be 0.0 therefore in the event of an empty negative example
// set we may as well stop here rather than recur and calculate a set
// of gains which will not be above the threshold!
if (negativeExamples2==null) {
foundRule(antecedent,consequent);
return;
}
// Repeat
cparGeneration(antecedent,consequent);
}
/* ------------------------------------------------------------ */
/* */
/* OUTPUT */
/* */
/* ------------------------------------------------------------ */
/* OUTPUT SETTINGS */
/** Outputs command line values provided by user. (Overrides higher level
method.) */
protected void outputSettings() {
System.out.println("SETTINGS\n----------");
System.out.println("File name = " + fileName);
System.out.println("Num. classes = " + numClasses);
System.out.println("K value = " + K_VALUE);
System.out.println("Min. best gain = " + MIN_BEST_GAIN);
System.out.println("Total Weight Factor = " + TOTAL_WEIGHT_FACTOR);
System.out.println("Decay factor = " + DECAY_FACROR);
System.out.println("Gain similarity ratio = " + GAIN_SIMILARITY_RATIO);
System.out.println();
}
}