/* -------------------------------------------------------------------------- */ /* */ /* WORD BINARY TREE NODE */ /* */ /* Frans Coenen */ /* */ /* Tuesday 20 December 2005 */ /* */ /* Department of Computer Science */ /* The University of Liverpool */ /* */ /* -------------------------------------------------------------------------- */ //package lucsKDD_ARM; /** Class to describe an individual node in the word bin tree. @author Frans Coenen @version 20 December 2005 */ public class WordBinTreeNode { /* ------------------------------- */ /* */ /* FIELDS */ /* */ /* ------------------------------- */ /** The word represented at the node. */ public String word = null; /** The Branch lexicographically before word. */ public WordBinTreeNode beforeBranch = null; /** The Branch lexicographically after word. */ public WordBinTreeNode afterBranch = null; /** The frequency count for the word, i.e the number of times tbe word occurs in the document set. */ public int wordFrequency = 1; /** The Support count for the word, i.e. the number of documents in which the word occurs (ignors the number of times that a word appears in a single document, thus count of 1 per document). */ public int support = 1; /** Array in which to store words frequency per class, i.e. the number of occaisions that the word represented by the node appears in the document base per class. */ public int[] classFrequency = null; /** Array in which to store support values per class, i.e. the number of occaisions that the word represented by the node appears at least once in a document per class (ignores frequency of appearnace per document). */ public int[] classSupport = null; /** Array in which to store contribution per class (used to identify significant words).

Note that contribution value is raised by a value of 10^2. */ public short[] contribution = null; /** Flag set to true if current word has already been considered with respect to current document (used only in construction of word bin tree when operating in "word support" mode). */ public boolean notPrerviouslyInDoc = true; /** Flag set to true if word is an upper noise word (i.e. above upper noise threshold). */ public boolean isUpperNoiseWord = false; /** Flag set to true if word is a lower noise word (i.e. below lower noise threshold). */ public boolean isLowerNoiseWord = false; /** Flag set to true if word is an ordinary word. i.e. is not a noise word and does not serve to differentiate between classes. */ public boolean isOrdWord = false; /** Flag set to true if word is a significant word. i.e. serves to differentiate between classes. */ public boolean isSigWord = false; /** Document numbers in which word appears, used for generating table of training set attributes in keyword mode (each attribute representing a keyword) --- not used for test set generation. */ public int[] docNumbers = new int[1]; /* ------------------------------------ */ /* */ /* CONSTRUCTORS */ /* */ /* ------------------------------------ */ /** Four argument constructor.

Also ensures word is all lower case and defines classSupport array. @param newWord the given word. @param docNum the document number. @param numClasses the number of classes represented in the document base. @param classLabel the class label (number) for the current document. */ public WordBinTreeNode(String newWord, int docNum, int numClasses, int classLabel) { // Add word and document number word = newWord.toLowerCase(); docNumbers[0] = docNum; // Define and initilise class support and frequency arrays classSupport = new int[numClasses]; classFrequency = new int[numClasses]; contribution = new short[numClasses]; for (int index=0;index