Class ClassifierSubsetEval

All Implemented Interfaces:
Serializable, ErrorBasedMeritEvaluator, SubsetEvaluator, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, RevisionHandler

public class ClassifierSubsetEval extends HoldOutSubsetEvaluator implements OptionHandler, ErrorBasedMeritEvaluator
Classifier subset evaluator:

Evaluates attribute subsets on training data or a separate hold out testing set. Uses a classifier to estimate the 'merit' of a set of attributes.

Valid options are:

 -B <classifier>
  class name of the classifier to use for accuracy estimation.
  Place any classifier options LAST on the command line
  following a "--". eg.:
   -B weka.classifiers.bayes.NaiveBayes ... -- -K
  (default: weka.classifiers.rules.ZeroR)
 -T
  Use the training data to estimate accuracy.
 -H <filename>
  Name of the hold out/test set to 
  estimate accuracy on.
 -percentage-split
  Perform a percentage split on the training data.
  Use in conjunction with -T.
 -P
  Split percentage to use (default = 90).
 -S
  Random seed for percentage split (default = 1).
 -E <DEFAULT|ACC|RMSE|MAE|F-MEAS|AUC|AUPRC|CORR-COEFF>
  Performance evaluation measure to use for selecting attributes.
  (Default = default: accuracy for discrete class and rmse for numeric class)
 -IRclass <label | index>
  Optional class value (label or 1-based index) to use in conjunction with
  IR statistics (f-meas, auc or auprc). Omitting this option will use
  the class-weighted average.
 
 Options specific to scheme weka.classifiers.rules.ZeroR:
 
 -output-debug-info
  If set, classifier is run in debug mode and
  may output additional info to the console
 -do-not-check-capabilities
  If set, classifier capabilities are not checked before classifier is built
  (use with caution).
 -num-decimal-places
  The number of decimal places for the output of numbers in the model (default 2).
 -batch-size
  The desired batch size for batch prediction  (default 100).
Version:
$Revision: 10332 $
Author:
Mark Hall (mhall@cs.waikato.ac.nz)
See Also:
  • Field Details

  • Constructor Details

    • ClassifierSubsetEval

      public ClassifierSubsetEval()
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this attribute evaluator
      Returns:
      a description of the evaluator suitable for displaying in the explorer/experimenter gui
    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class ASEvaluation
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -B <classifier>
        class name of the classifier to use for accuracy estimation.
        Place any classifier options LAST on the command line
        following a "--". eg.:
         -B weka.classifiers.bayes.NaiveBayes ... -- -K
        (default: weka.classifiers.rules.ZeroR)
       -T
        Use the training data to estimate accuracy.
       -H <filename>
        Name of the hold out/test set to 
        estimate accuracy on.
       -percentage-split
        Perform a percentage split on the training data.
        Use in conjunction with -T.
       -P
        Split percentage to use (default = 90).
       -S
        Random seed for percentage split (default = 1).
       -E <DEFAULT|ACC|RMSE|MAE|F-MEAS|AUC|AUPRC|CORR-COEFF>
        Performance evaluation measure to use for selecting attributes.
        (Default = default: accuracy for discrete class and rmse for numeric class)
       -IRclass <label | index>
        Optional class value (label or 1-based index) to use in conjunction with
        IR statistics (f-meas, auc or auprc). Omitting this option will use
        the class-weighted average.
       
       Options specific to scheme weka.classifiers.rules.ZeroR:
       
       -output-debug-info
        If set, classifier is run in debug mode and
        may output additional info to the console
       -do-not-check-capabilities
        If set, classifier capabilities are not checked before classifier is built
        (use with caution).
       -num-decimal-places
        The number of decimal places for the output of numbers in the model (default 2).
       -batch-size
        The desired batch size for batch prediction  (default 100).
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class ASEvaluation
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • seedTipText

      public String seedTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setSeed

      public void setSeed(int s)
      Set the random seed used to randomize the data before performing a percentage split
      Parameters:
      s - the seed to use
    • getSeed

      public int getSeed()
      Get the random seed used to randomize the data before performing a percentage split
      Returns:
      the seed to use
    • usePercentageSplitTipText

      public String usePercentageSplitTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setUsePercentageSplit

      public void setUsePercentageSplit(boolean p)
      Set whether to perform a percentage split on the training data for evaluation
      Parameters:
      p - true if a percentage split is to be performed
    • getUsePercentageSplit

      public boolean getUsePercentageSplit()
      Get whether to perform a percentage split on the training data for evaluation
      Returns:
      true if a percentage split is to be performed
    • splitPercentTipText

      public String splitPercentTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setSplitPercent

      public void setSplitPercent(String sp)
      Set the split percentage to use
      Parameters:
      sp - the split percentage to use
    • getSplitPercent

      public String getSplitPercent()
      Get the split percentage to use
      Returns:
      the split percentage to use
    • setIRClassValue

      public void setIRClassValue(String val)
      Set the class value (label or index) to use with IR metric evaluation of subsets. Leaving this unset will result in the class weighted average for the IR metric being used.
      Parameters:
      val - the class label or 1-based index of the class label to use when evaluating subsets with an IR metric
    • getIRClassValue

      public String getIRClassValue()
      Get the class value (label or index) to use with IR metric evaluation of subsets. Leaving this unset will result in the class weighted average for the IR metric being used.
      Returns:
      the class label or 1-based index of the class label to use when evaluating subsets with an IR metric
    • IRClassValueTipText

      public String IRClassValueTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • evaluationMeasureTipText

      public String evaluationMeasureTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getEvaluationMeasure

      public SelectedTag getEvaluationMeasure()
      Gets the currently set performance evaluation measure used for selecting attributes for the decision table
      Returns:
      the performance evaluation measure
    • setEvaluationMeasure

      public void setEvaluationMeasure(SelectedTag newMethod)
      Sets the performance evaluation measure to use for selecting attributes for the decision table
      Parameters:
      newMethod - the new performance evaluation metric to use
    • classifierTipText

      public String classifierTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setClassifier

      public void setClassifier(Classifier newClassifier)
      Set the classifier to use for accuracy estimation
      Parameters:
      newClassifier - the Classifier to use.
    • getClassifier

      public Classifier getClassifier()
      Get the classifier used as the base learner.
      Returns:
      the classifier used as the classifier
    • holdOutFileTipText

      public String holdOutFileTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getHoldOutFile

      public File getHoldOutFile()
      Gets the file that holds hold out/test instances.
      Returns:
      File that contains hold out instances
    • setHoldOutFile

      public void setHoldOutFile(File h)
      Set the file that contains hold out/test instances
      Parameters:
      h - the hold out file
    • useTrainingTipText

      public String useTrainingTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getUseTraining

      public boolean getUseTraining()
      Get if training data is to be used instead of hold out/test data
      Returns:
      true if training data is to be used instead of hold out data
    • setUseTraining

      public void setUseTraining(boolean t)
      Set if training data is to be used instead of hold out/test data
      Parameters:
      t - true if training data is to be used instead of hold out data
    • getOptions

      public String[] getOptions()
      Gets the current settings of ClassifierSubsetEval
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class ASEvaluation
      Returns:
      an array of strings suitable for passing to setOptions()
    • getCapabilities

      public Capabilities getCapabilities()
      Returns the capabilities of this evaluator.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Overrides:
      getCapabilities in class ASEvaluation
      Returns:
      the capabilities of this evaluator
      See Also:
    • buildEvaluator

      public void buildEvaluator(Instances data) throws Exception
      Generates a attribute evaluator. Has to initialize all fields of the evaluator that are not being set via options.
      Specified by:
      buildEvaluator in class ASEvaluation
      Parameters:
      data - set of instances serving as training data
      Throws:
      Exception - if the evaluator has not been generated successfully
    • evaluateSubset

      public double evaluateSubset(BitSet subset) throws Exception
      Evaluates a subset of attributes
      Specified by:
      evaluateSubset in interface SubsetEvaluator
      Parameters:
      subset - a bitset representing the attribute subset to be evaluated
      Returns:
      the error rate
      Throws:
      Exception - if the subset could not be evaluated
    • evaluateSubset

      public double evaluateSubset(BitSet subset, Instances holdOut) throws Exception
      Evaluates a subset of attributes with respect to a set of instances. Calling this function overrides any test/hold out instances set from setHoldOutFile.
      Specified by:
      evaluateSubset in class HoldOutSubsetEvaluator
      Parameters:
      subset - a bitset representing the attribute subset to be evaluated
      holdOut - a set of instances (possibly separate and distinct from those use to build/train the evaluator) with which to evaluate the merit of the subset
      Returns:
      the "merit" of the subset on the holdOut data
      Throws:
      Exception - if the subset cannot be evaluated
    • evaluateSubset

      public double evaluateSubset(BitSet subset, Instance holdOut, boolean retrain) throws Exception
      Evaluates a subset of attributes with respect to a single instance. Calling this function overides any hold out/test instances set through setHoldOutFile.
      Specified by:
      evaluateSubset in class HoldOutSubsetEvaluator
      Parameters:
      subset - a bitset representing the attribute subset to be evaluated
      holdOut - a single instance (possibly not one of those used to build/train the evaluator) with which to evaluate the merit of the subset
      retrain - true if the classifier should be retrained with respect to the new subset before testing on the holdOut instance.
      Returns:
      the "merit" of the subset on the holdOut instance
      Throws:
      Exception - if the subset cannot be evaluated
    • toString

      public String toString()
      Returns a string describing classifierSubsetEval
      Overrides:
      toString in class Object
      Returns:
      the description as a string
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class ASEvaluation
      Returns:
      the revision
    • main

      public static void main(String[] args)
      Main method for testing this class.
      Parameters:
      args - the options