weka.attributeSelection.ClassifierSubsetEval

All Implemented Interfaces:: Serializable, ErrorBasedMeritEvaluator, SubsetEvaluator, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, RevisionHandler

public class ClassifierSubsetEval extends HoldOutSubsetEvaluator implements OptionHandler, ErrorBasedMeritEvaluator

Classifier subset evaluator:

Evaluates attribute subsets on training data or a separate hold out testing set. Uses a classifier to estimate the 'merit' of a set of attributes.

Valid options are:

 -B <classifier>
  class name of the classifier to use for accuracy estimation.
  Place any classifier options LAST on the command line
  following a "--". eg.:
   -B weka.classifiers.bayes.NaiveBayes ... -- -K
  (default: weka.classifiers.rules.ZeroR)

 -T
  Use the training data to estimate accuracy.

 -H <filename>
  Name of the hold out/test set to 
  estimate accuracy on.

 -percentage-split
  Perform a percentage split on the training data.
  Use in conjunction with -T.

 -P
  Split percentage to use (default = 90).

 -S
  Random seed for percentage split (default = 1).

 -E <DEFAULT|ACC|RMSE|MAE|F-MEAS|AUC|AUPRC|CORR-COEFF>
  Performance evaluation measure to use for selecting attributes.
  (Default = default: accuracy for discrete class and rmse for numeric class)

 -IRclass <label | index>
  Optional class value (label or 1-based index) to use in conjunction with
  IR statistics (f-meas, auc or auprc). Omitting this option will use
  the class-weighted average.

 
 Options specific to scheme weka.classifiers.rules.ZeroR:

 -output-debug-info
  If set, classifier is run in debug mode and
  may output additional info to the console

 -do-not-check-capabilities
  If set, classifier capabilities are not checked before classifier is built
  (use with caution).

 -num-decimal-places
  The number of decimal places for the output of numbers in the model (default 2).

 -batch-size
  The desired batch size for batch prediction  (default 100).

Version:

$Revision: 10332 $

Author:

Mark Hall (mhall@cs.waikato.ac.nz)

See Also:

Serialized Form

Field Summary

Fields

Modifier and Type

Field

Description

static final int

EVAL_ACCURACY

static final int

EVAL_AUC

static final int

EVAL_AUPRC

static final int

EVAL_CORRELATION

static final int

EVAL_DEFAULT

static final int

EVAL_FMEASURE

static final int

EVAL_MAE

static final int

EVAL_PLUGIN

static final int

EVAL_RMSE

static final Tag[]

TAGS_EVALUATION

Holds all tags for metrics
Constructor Summary

Constructors

Constructor

Description

ClassifierSubsetEval()
Method Summary

Modifier and Type

Method

Description

void

buildEvaluator(Instances data)

Generates a attribute evaluator.

String

classifierTipText()

Returns the tip text for this property

double

evaluateSubset(BitSet subset)

Evaluates a subset of attributes

double

evaluateSubset(BitSet subset, Instance holdOut, boolean retrain)

Evaluates a subset of attributes with respect to a single instance.

double

evaluateSubset(BitSet subset, Instances holdOut)

Evaluates a subset of attributes with respect to a set of instances.

String

evaluationMeasureTipText()

Returns the tip text for this property

Capabilities

getCapabilities()

Returns the capabilities of this evaluator.

Classifier

getClassifier()

Get the classifier used as the base learner.

SelectedTag

getEvaluationMeasure()

Gets the currently set performance evaluation measure used for selecting attributes for the decision table

File

getHoldOutFile()

Gets the file that holds hold out/test instances.

String

getIRClassValue()

Get the class value (label or index) to use with IR metric evaluation of subsets.

String[]

getOptions()

Gets the current settings of ClassifierSubsetEval

String

getRevision()

Returns the revision string.

int

getSeed()

Get the random seed used to randomize the data before performing a percentage split

String

getSplitPercent()

Get the split percentage to use

boolean

getUsePercentageSplit()

Get whether to perform a percentage split on the training data for evaluation

boolean

getUseTraining()

Get if training data is to be used instead of hold out/test data

String

globalInfo()

Returns a string describing this attribute evaluator

String

holdOutFileTipText()

Returns the tip text for this property

String

IRClassValueTipText()

Returns the tip text for this property

Enumeration<Option>

listOptions()

Returns an enumeration describing the available options.

static void

main(String[] args)

Main method for testing this class.

String

seedTipText()

Returns the tip text for this property

void

setClassifier(Classifier newClassifier)

Set the classifier to use for accuracy estimation

void

setEvaluationMeasure(SelectedTag newMethod)

Sets the performance evaluation measure to use for selecting attributes for the decision table

void

setHoldOutFile(File h)

Set the file that contains hold out/test instances

void

setIRClassValue(String val)

Set the class value (label or index) to use with IR metric evaluation of subsets.

void

setOptions(String[] options)

Parses a given list of options.

void

setSeed(int s)

Set the random seed used to randomize the data before performing a percentage split

void

setSplitPercent(String sp)

Set the split percentage to use

void

setUsePercentageSplit(boolean p)

Set whether to perform a percentage split on the training data for evaluation

void

setUseTraining(boolean t)

Set if training data is to be used instead of hold out/test data

String

splitPercentTipText()

Returns the tip text for this property

String

toString()

Returns a string describing classifierSubsetEval

String

usePercentageSplitTipText()

Returns the tip text for this property

String

useTrainingTipText()

Returns the tip text for this property

Methods inherited from class weka.attributeSelection.ASEvaluation
clean, doNotCheckCapabilitiesTipText, forName, getDoNotCheckCapabilities, makeCopies, postExecution, postProcess, preExecution, run, runEvaluator, setDoNotCheckCapabilities

Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- EVAL_DEFAULT
  
  public static final int EVAL_DEFAULT
  See Also:
  
  Constant Field Values
- EVAL_ACCURACY
  
  public static final int EVAL_ACCURACY
  See Also:
  
  Constant Field Values
- EVAL_RMSE
  
  public static final int EVAL_RMSE
  See Also:
  
  Constant Field Values
- EVAL_MAE
  
  public static final int EVAL_MAE
  See Also:
  
  Constant Field Values
- EVAL_FMEASURE
  
  public static final int EVAL_FMEASURE
  See Also:
  
  Constant Field Values
- EVAL_AUC
  
  public static final int EVAL_AUC
  See Also:
  
  Constant Field Values
- EVAL_AUPRC
  
  public static final int EVAL_AUPRC
  See Also:
  
  Constant Field Values
- EVAL_CORRELATION
  
  public static final int EVAL_CORRELATION
  See Also:
  
  Constant Field Values
- EVAL_PLUGIN
  
  public static final int EVAL_PLUGIN
  See Also:
  
  Constant Field Values
- TAGS_EVALUATION
  
  public static final Tag[] TAGS_EVALUATION
  
  Holds all tags for metrics
Constructor Details
- ClassifierSubsetEval
  
  public ClassifierSubsetEval()
Method Details
- globalInfo
  
  public String globalInfo()
  
  Returns a string describing this attribute evaluator
  
  Returns:
  
  a description of the evaluator suitable for displaying in the explorer/experimenter gui
- listOptions
  
  public Enumeration<Option> listOptions()
  
  Returns an enumeration describing the available options.
  
  Specified by:
  
  listOptions in interface OptionHandler
  
  Overrides:
  
  listOptions in class ASEvaluation
  
  Returns:
  
  an enumeration of all the available options.
- setOptions
  
  public void setOptions(String[] options) throws Exception
  Parses a given list of options.
  Valid options are:
  
  -B <classifier> class name of the classifier to use for accuracy estimation. Place any classifier options LAST on the command line following a "--". eg.: -B weka.classifiers.bayes.NaiveBayes ... -- -K (default: weka.classifiers.rules.ZeroR)
  
  -T Use the training data to estimate accuracy.
  
  -H <filename> Name of the hold out/test set to estimate accuracy on.
  
  -percentage-split Perform a percentage split on the training data. Use in conjunction with -T.
  
  -P Split percentage to use (default = 90).
  
  -S Random seed for percentage split (default = 1).
  
  -E <DEFAULT|ACC|RMSE|MAE|F-MEAS|AUC|AUPRC|CORR-COEFF> Performance evaluation measure to use for selecting attributes. (Default = default: accuracy for discrete class and rmse for numeric class)
  
  -IRclass <label | index> Optional class value (label or 1-based index) to use in conjunction with IR statistics (f-meas, auc or auprc). Omitting this option will use the class-weighted average.
  
  Options specific to scheme weka.classifiers.rules.ZeroR:
  
  -output-debug-info If set, classifier is run in debug mode and may output additional info to the console
  
  -do-not-check-capabilities If set, classifier capabilities are not checked before classifier is built (use with caution).
  
  -num-decimal-places The number of decimal places for the output of numbers in the model (default 2).
  
  -batch-size The desired batch size for batch prediction (default 100).
  Specified by:
  
  setOptions in interface OptionHandler
  
  Overrides:
  
  setOptions in class ASEvaluation
  
  Parameters:
  
  options - the list of options as an array of strings
  
  Throws:
  
  Exception - if an option is not supported
- seedTipText
  
  public String seedTipText()
  
  Returns the tip text for this property
  
  Returns:
  
  tip text for this property suitable for displaying in the explorer/experimenter gui
- setSeed
  
  public void setSeed(int s)
  
  Set the random seed used to randomize the data before performing a percentage split
  
  Parameters:
  
  s - the seed to use
- getSeed
  
  public int getSeed()
  
  Get the random seed used to randomize the data before performing a percentage split
  
  Returns:
  
  the seed to use
- usePercentageSplitTipText
  
  public String usePercentageSplitTipText()
  
  Returns the tip text for this property
  
  Returns:
  
  tip text for this property suitable for displaying in the explorer/experimenter gui
- setUsePercentageSplit
  
  public void setUsePercentageSplit(boolean p)
  
  Set whether to perform a percentage split on the training data for evaluation
  
  Parameters:
  
  p - true if a percentage split is to be performed
- getUsePercentageSplit
  
  public boolean getUsePercentageSplit()
  
  Get whether to perform a percentage split on the training data for evaluation
  
  Returns:
  
  true if a percentage split is to be performed
- splitPercentTipText
  
  public String splitPercentTipText()
  
  Returns the tip text for this property
  
  Returns:
  
  tip text for this property suitable for displaying in the explorer/experimenter gui
- setSplitPercent
  
  public void setSplitPercent(String sp)
  
  Set the split percentage to use
  
  Parameters:
  
  sp - the split percentage to use
- getSplitPercent
  
  public String getSplitPercent()
  
  Get the split percentage to use
  
  Returns:
  
  the split percentage to use
- setIRClassValue
  
  public void setIRClassValue(String val)
  
  Set the class value (label or index) to use with IR metric evaluation of subsets. Leaving this unset will result in the class weighted average for the IR metric being used.
  
  Parameters:
  
  val - the class label or 1-based index of the class label to use when evaluating subsets with an IR metric
- getIRClassValue
  
  public String getIRClassValue()
  
  Get the class value (label or index) to use with IR metric evaluation of subsets. Leaving this unset will result in the class weighted average for the IR metric being used.
  
  Returns:
  
  the class label or 1-based index of the class label to use when evaluating subsets with an IR metric
- IRClassValueTipText
  
  public String IRClassValueTipText()
  
  Returns the tip text for this property
  
  Returns:
  
  tip text for this property suitable for displaying in the explorer/experimenter gui
- evaluationMeasureTipText
  
  public String evaluationMeasureTipText()
  
  Returns the tip text for this property
  
  Returns:
  
  tip text for this property suitable for displaying in the explorer/experimenter gui
- getEvaluationMeasure
  
  public SelectedTag getEvaluationMeasure()
  
  Gets the currently set performance evaluation measure used for selecting attributes for the decision table
  
  Returns:
  
  the performance evaluation measure
- setEvaluationMeasure
  
  public void setEvaluationMeasure(SelectedTag newMethod)
  
  Sets the performance evaluation measure to use for selecting attributes for the decision table
  
  Parameters:
  
  newMethod - the new performance evaluation metric to use
- classifierTipText
  
  public String classifierTipText()
  
  Returns the tip text for this property
  
  Returns:
  
  tip text for this property suitable for displaying in the explorer/experimenter gui
- setClassifier
  
  public void setClassifier(Classifier newClassifier)
  
  Set the classifier to use for accuracy estimation
  
  Parameters:
  
  newClassifier - the Classifier to use.
- getClassifier
  
  public Classifier getClassifier()
  
  Get the classifier used as the base learner.
  
  Returns:
  
  the classifier used as the classifier
- holdOutFileTipText
  
  public String holdOutFileTipText()
  
  Returns the tip text for this property
  
  Returns:
  
  tip text for this property suitable for displaying in the explorer/experimenter gui
- getHoldOutFile
  
  public File getHoldOutFile()
  
  Gets the file that holds hold out/test instances.
  
  Returns:
  
  File that contains hold out instances
- setHoldOutFile
  
  public void setHoldOutFile(File h)
  
  Set the file that contains hold out/test instances
  
  Parameters:
  
  h - the hold out file
- useTrainingTipText
  
  public String useTrainingTipText()
  
  Returns the tip text for this property
  
  Returns:
  
  tip text for this property suitable for displaying in the explorer/experimenter gui
- getUseTraining
  
  public boolean getUseTraining()
  
  Get if training data is to be used instead of hold out/test data
  
  Returns:
  
  true if training data is to be used instead of hold out data
- setUseTraining
  
  public void setUseTraining(boolean t)
  
  Set if training data is to be used instead of hold out/test data
  
  Parameters:
  
  t - true if training data is to be used instead of hold out data
- getOptions
  
  public String[] getOptions()
  
  Gets the current settings of ClassifierSubsetEval
  
  Specified by:
  
  getOptions in interface OptionHandler
  
  Overrides:
  
  getOptions in class ASEvaluation
  
  Returns:
  
  an array of strings suitable for passing to setOptions()
- getCapabilities
  
  public Capabilities getCapabilities()
  
  Returns the capabilities of this evaluator.
  Specified by:
  
  getCapabilities in interface CapabilitiesHandler
  
  Overrides:
  
  getCapabilities in class ASEvaluation
  
  Returns:
  
  the capabilities of this evaluator
  
  See Also:
  
  Capabilities
- buildEvaluator
  
  public void buildEvaluator(Instances data) throws Exception
  
  Generates a attribute evaluator. Has to initialize all fields of the evaluator that are not being set via options.
  
  Specified by:
  
  buildEvaluator in class ASEvaluation
  
  Parameters:
  
  data - set of instances serving as training data
  
  Throws:
  
  Exception - if the evaluator has not been generated successfully
- evaluateSubset
  
  public double evaluateSubset(BitSet subset) throws Exception
  
  Evaluates a subset of attributes
  
  Specified by:
  
  evaluateSubset in interface SubsetEvaluator
  
  Parameters:
  
  subset - a bitset representing the attribute subset to be evaluated
  
  Returns:
  
  the error rate
  
  Throws:
  
  Exception - if the subset could not be evaluated
- evaluateSubset
  
  public double evaluateSubset(BitSet subset, Instances holdOut) throws Exception
  
  Evaluates a subset of attributes with respect to a set of instances. Calling this function overrides any test/hold out instances set from setHoldOutFile.
  
  Specified by:
  
  evaluateSubset in class HoldOutSubsetEvaluator
  
  Parameters:
  
  subset - a bitset representing the attribute subset to be evaluated
  
  holdOut - a set of instances (possibly separate and distinct from those use to build/train the evaluator) with which to evaluate the merit of the subset
  
  Returns:
  
  the "merit" of the subset on the holdOut data
  
  Throws:
  
  Exception - if the subset cannot be evaluated
- evaluateSubset
  
  public double evaluateSubset(BitSet subset, Instance holdOut, boolean retrain) throws Exception
  
  Evaluates a subset of attributes with respect to a single instance. Calling this function overides any hold out/test instances set through setHoldOutFile.
  
  Specified by:
  
  evaluateSubset in class HoldOutSubsetEvaluator
  
  Parameters:
  
  subset - a bitset representing the attribute subset to be evaluated
  
  holdOut - a single instance (possibly not one of those used to build/train the evaluator) with which to evaluate the merit of the subset
  
  retrain - true if the classifier should be retrained with respect to the new subset before testing on the holdOut instance.
  
  Returns:
  
  the "merit" of the subset on the holdOut instance
  
  Throws:
  
  Exception - if the subset cannot be evaluated
- toString
  
  public String toString()
  
  Returns a string describing classifierSubsetEval
  
  Overrides:
  
  toString in class Object
  
  Returns:
  
  the description as a string
- getRevision
  
  public String getRevision()
  
  Returns the revision string.
  
  Specified by:
  
  getRevision in interface RevisionHandler
  
  Overrides:
  
  getRevision in class ASEvaluation
  
  Returns:
  
  the revision
- main
  
  public static void main(String[] args)
  
  Main method for testing this class.
  
  Parameters:
  
  args - the options

Class ClassifierSubsetEval

Field Summary

Constructor Summary

Method Summary

Methods inherited from class weka.attributeSelection.ASEvaluation

Methods inherited from class java.lang.Object

Field Details

EVAL_DEFAULT

EVAL_ACCURACY

EVAL_RMSE

EVAL_MAE

EVAL_FMEASURE

EVAL_AUC

EVAL_AUPRC

EVAL_CORRELATION

EVAL_PLUGIN

TAGS_EVALUATION

Constructor Details

ClassifierSubsetEval

Method Details

globalInfo

listOptions

setOptions

seedTipText

setSeed

getSeed

usePercentageSplitTipText

setUsePercentageSplit

getUsePercentageSplit

splitPercentTipText

setSplitPercent

getSplitPercent

setIRClassValue

getIRClassValue

IRClassValueTipText

evaluationMeasureTipText

getEvaluationMeasure

setEvaluationMeasure

classifierTipText

setClassifier

getClassifier

holdOutFileTipText

getHoldOutFile

setHoldOutFile

useTrainingTipText

getUseTraining

setUseTraining

getOptions

getCapabilities

buildEvaluator

evaluateSubset

evaluateSubset

evaluateSubset

toString

getRevision

main