Class CostSensitiveClassifier

All Implemented Interfaces:
Serializable, Cloneable, Classifier, BatchPredictor, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, Drawable, OptionHandler, Randomizable, RevisionHandler, WeightedInstancesHandler

A metaclassifier that makes its base classifier cost sensitive. Two methods can be used to introduce cost-sensitivity: reweighting training instances according to the total cost assigned to each class; or predicting the class with minimum expected misclassification cost (rather than the most likely class). Performance can often be improved by using a bagged classifier to improve the probability estimates of the base classifier. If the base classifier cannot handle instance weights, and the instance weights are not uniform, the data will be resampled with replacement based on the weights before being passed to the base classifier.

Valid options are:

 -M
  Minimize expected misclassification cost. Default is to
  reweight training instances according to costs per class
 -C <cost file name>
  File name of a cost matrix to use. If this is not supplied,
  a cost matrix will be loaded on demand. The name of the
  on-demand file is the relation name of the training data
  plus ".cost", and the path to the on-demand file is
  specified with the -N option.
 -N <directory>
  Name of a directory to search for cost files when loading
  costs on demand (default current directory).
 -cost-matrix <matrix>
  The cost matrix in Matlab single line format.
 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -W
  Full name of base classifier.
  (default: weka.classifiers.rules.ZeroR)
 
 Options specific to classifier weka.classifiers.rules.ZeroR:
 
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
Options after -- are passed to the designated classifier.

Version:
$Revision: 15519 $
Author:
Len Trigg (len@reeltwo.com)
See Also:
  • Field Details

    • MATRIX_ON_DEMAND

      public static final int MATRIX_ON_DEMAND
      load cost matrix on demand
      See Also:
    • MATRIX_SUPPLIED

      public static final int MATRIX_SUPPLIED
      use explicit cost matrix
      See Also:
    • TAGS_MATRIX_SOURCE

      public static final Tag[] TAGS_MATRIX_SOURCE
      Specify possible sources of the cost matrix
  • Constructor Details

    • CostSensitiveClassifier

      public CostSensitiveClassifier()
      Default constructor.
  • Method Details

    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class RandomizableSingleClassifierEnhancer
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -M
        Minimize expected misclassification cost. Default is to
        reweight training instances according to costs per class
       -C <cost file name>
        File name of a cost matrix to use. If this is not supplied,
        a cost matrix will be loaded on demand. The name of the
        on-demand file is the relation name of the training data
        plus ".cost", and the path to the on-demand file is
        specified with the -N option.
       -N <directory>
        Name of a directory to search for cost files when loading
        costs on demand (default current directory).
       -cost-matrix <matrix>
        The cost matrix in Matlab single line format.
       -S <num>
        Random number seed.
        (default 1)
       -D
        If set, classifier is run in debug mode and
        may output additional info to the console
       -W
        Full name of base classifier.
        (default: weka.classifiers.rules.ZeroR)
       
       Options specific to classifier weka.classifiers.rules.ZeroR:
       
       -D
        If set, classifier is run in debug mode and
        may output additional info to the console
      Options after -- are passed to the designated classifier.

      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class RandomizableSingleClassifierEnhancer
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the Classifier.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class RandomizableSingleClassifierEnhancer
      Returns:
      an array of strings suitable for passing to setOptions
    • globalInfo

      public String globalInfo()
      Returns:
      a description of the classifier suitable for displaying in the explorer/experimenter gui
    • costMatrixSourceTipText

      public String costMatrixSourceTipText()
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getCostMatrixSource

      public SelectedTag getCostMatrixSource()
      Gets the source location method of the cost matrix. Will be one of MATRIX_ON_DEMAND or MATRIX_SUPPLIED.
      Returns:
      the cost matrix source.
    • setCostMatrixSource

      public void setCostMatrixSource(SelectedTag newMethod)
      Sets the source location of the cost matrix. Values other than MATRIX_ON_DEMAND or MATRIX_SUPPLIED will be ignored.
      Parameters:
      newMethod - the cost matrix location method.
    • onDemandDirectoryTipText

      public String onDemandDirectoryTipText()
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getOnDemandDirectory

      public File getOnDemandDirectory()
      Returns the directory that will be searched for cost files when loading on demand.
      Returns:
      The cost file search directory.
    • setOnDemandDirectory

      public void setOnDemandDirectory(File newDir)
      Sets the directory that will be searched for cost files when loading on demand.
      Parameters:
      newDir - The cost file search directory.
    • minimizeExpectedCostTipText

      public String minimizeExpectedCostTipText()
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getMinimizeExpectedCost

      public boolean getMinimizeExpectedCost()
      Gets the value of MinimizeExpectedCost.
      Returns:
      Value of MinimizeExpectedCost.
    • setMinimizeExpectedCost

      public void setMinimizeExpectedCost(boolean newMinimizeExpectedCost)
      Set the value of MinimizeExpectedCost.
      Parameters:
      newMinimizeExpectedCost - Value to assign to MinimizeExpectedCost.
    • costMatrixTipText

      public String costMatrixTipText()
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getCostMatrix

      public CostMatrix getCostMatrix()
      Gets the misclassification cost matrix.
      Returns:
      the cost matrix
    • setCostMatrix

      public void setCostMatrix(CostMatrix newCostMatrix)
      Sets the misclassification cost matrix.
      Parameters:
      newCostMatrix - the cost matrix
    • getCapabilities

      public Capabilities getCapabilities()
      Returns default capabilities of the classifier.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Specified by:
      getCapabilities in interface Classifier
      Overrides:
      getCapabilities in class SingleClassifierEnhancer
      Returns:
      the capabilities of this classifier
      See Also:
    • buildClassifier

      public void buildClassifier(Instances data) throws Exception
      Builds the model of the base learner.
      Specified by:
      buildClassifier in interface Classifier
      Parameters:
      data - the training data
      Throws:
      Exception - if the classifier could not be built successfully
    • distributionForInstance

      public double[] distributionForInstance(Instance instance) throws Exception
      Returns class probabilities. When minimum expected cost approach is chosen, returns probability one for class with the minimum expected misclassification cost. Otherwise it returns the probability distribution returned by the base classifier.
      Specified by:
      distributionForInstance in interface Classifier
      Overrides:
      distributionForInstance in class AbstractClassifier
      Parameters:
      instance - the instance to be classified
      Returns:
      the computed distribution for the given instance
      Throws:
      Exception - if instance could not be classified successfully
    • distributionsForInstances

      public double[][] distributionsForInstances(Instances insts) throws Exception
      Batch scoring method. Calls the appropriate method for the base learner if it implements BatchPredictor. Otherwise it simply calls the distributionForInstance() method repeatedly.
      Specified by:
      distributionsForInstances in interface BatchPredictor
      Overrides:
      distributionsForInstances in class AbstractClassifier
      Parameters:
      insts - the instances to get predictions for
      Returns:
      an array of probability distributions, one for each instance
      Throws:
      Exception - if a problem occurs
    • batchSizeTipText

      public String batchSizeTipText()
      Tool tip text for this property
      Overrides:
      batchSizeTipText in class AbstractClassifier
      Returns:
      the tool tip for this property
    • setBatchSize

      public void setBatchSize(String size)
      Set the batch size to use. Gets passed through to the base learner if it implements BatchPredictor. Otherwise it is just ignored.
      Specified by:
      setBatchSize in interface BatchPredictor
      Overrides:
      setBatchSize in class AbstractClassifier
      Parameters:
      size - the batch size to use
    • getBatchSize

      public String getBatchSize()
      Gets the preferred batch size from the base learner if it implements BatchPredictor. Returns 1 as the preferred batch size otherwise.
      Specified by:
      getBatchSize in interface BatchPredictor
      Overrides:
      getBatchSize in class AbstractClassifier
      Returns:
      the batch size to use
    • implementsMoreEfficientBatchPrediction

      public boolean implementsMoreEfficientBatchPrediction()
      Returns true if the base classifier implements BatchPredictor and is able to generate batch predictions efficiently
      Specified by:
      implementsMoreEfficientBatchPrediction in interface BatchPredictor
      Overrides:
      implementsMoreEfficientBatchPrediction in class AbstractClassifier
      Returns:
      true if the base classifier can generate batch predictions efficiently
    • graphType

      public int graphType()
      Returns the type of graph this classifier represents.
      Specified by:
      graphType in interface Drawable
      Returns:
      the type of graph this classifier represents
    • graph

      public String graph() throws Exception
      Returns graph describing the classifier (if possible).
      Specified by:
      graph in interface Drawable
      Returns:
      the graph of the classifier in dotty format
      Throws:
      Exception - if the classifier cannot be graphed
    • toString

      public String toString()
      Output a representation of this classifier
      Overrides:
      toString in class Object
      Returns:
      a string representation of the classifier
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class AbstractClassifier
      Returns:
      the revision
    • main

      public static void main(String[] argv)
      Main method for testing this class.
      Parameters:
      argv - should contain the following arguments: -t training file [-T test file] [-c class index]