Class Discretize

java.lang.Object
weka.filters.Filter
weka.filters.supervised.attribute.Discretize
All Implemented Interfaces:
Serializable, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, RevisionHandler, TechnicalInformationHandler, WeightedAttributesHandler, WeightedInstancesHandler, SupervisedFilter

An instance filter that discretizes a range of numeric attributes in the dataset into nominal attributes. Discretization is by Fayyad & Irani's MDL method (the default).

For more information, see:

Usama M. Fayyad, Keki B. Irani: Multi-interval discretization of continuousvalued attributes for classification learning. In: Thirteenth International Joint Conference on Articial Intelligence, 1022-1027, 1993.

Igor Kononenko: On Biases in Estimating Multi-Valued Attributes. In: 14th International Joint Conference on Articial Intelligence, 1034-1040, 1995.

BibTeX:

 @inproceedings{Fayyad1993,
    author = {Usama M. Fayyad and Keki B. Irani},
    booktitle = {Thirteenth International Joint Conference on Articial Intelligence},
    pages = {1022-1027},
    publisher = {Morgan Kaufmann Publishers},
    title = {Multi-interval discretization of continuousvalued attributes for classification learning},
    volume = {2},
    year = {1993}
 }
 
 @inproceedings{Kononenko1995,
    author = {Igor Kononenko},
    booktitle = {14th International Joint Conference on Articial Intelligence},
    pages = {1034-1040},
    title = {On Biases in Estimating Multi-Valued Attributes},
    year = {1995},
    PS = {http://ai.fri.uni-lj.si/papers/kononenko95-ijcai.ps.gz}
 }
 

Valid options are:

 -R <col1,col2-col4,...>
  Specifies list of columns to Discretize. First and last are valid indexes.
  (default none)
 -V
  Invert matching sense of column indexes.
 -D
  Output binary attributes for discretized attributes.
 -Y
  Use bin numbers rather than ranges for discretized attributes.
 -E
  Use better encoding of split point for MDL.
 -K
  Use Kononenko's MDL criterion.
 -precision <integer>
  Precision for bin boundary labels.
  (default = 6 decimal places).
-spread-attribute-weight
  When generating binary attributes, spread weight of old
  attribute across new attributes. Do not give each new attribute the old weight.
Version:
$Revision: 14509 $
Author:
Len Trigg (trigg@cs.waikato.ac.nz), Eibe Frank (eibe@cs.waikato.ac.nz)
See Also:
  • Constructor Details

    • Discretize

      public Discretize()
      Constructor - initialises the filter
  • Method Details

    • listOptions

      public Enumeration<Option> listOptions()
      Gets an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class Filter
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -R <col1,col2-col4,...>
        Specifies list of columns to Discretize. First and last are valid indexes.
        (default none)
       -V
        Invert matching sense of column indexes.
       -D
        Output binary attributes for discretized attributes.
       -Y
        Use bin numbers rather than ranges for discretized attributes.
       -E
        Use better encoding of split point for MDL.
       -K
        Use Kononenko's MDL criterion.
       -precision <integer>
        Precision for bin boundary labels.
        (default = 6 decimal places).
      -spread-attribute-weight
        When generating binary attributes, spread weight of old
        attribute across new attributes. Do not give each new attribute the old weight.
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class Filter
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the filter.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class Filter
      Returns:
      an array of strings suitable for passing to setOptions
    • getCapabilities

      public Capabilities getCapabilities()
      Returns the Capabilities of this filter.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Overrides:
      getCapabilities in class Filter
      Returns:
      the capabilities of this object
      See Also:
    • setInputFormat

      public boolean setInputFormat(Instances instanceInfo) throws Exception
      Sets the format of the input instances.
      Overrides:
      setInputFormat in class Filter
      Parameters:
      instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
      Returns:
      true if the outputFormat may be collected immediately
      Throws:
      Exception - if the input format can't be set successfully
    • input

      public boolean input(Instance instance)
      Input an instance for filtering. Ordinarily the instance is processed and made available for output immediately. Some filters require all instances be read before producing output.
      Overrides:
      input in class Filter
      Parameters:
      instance - the input instance
      Returns:
      true if the filtered instance may now be collected with output().
      Throws:
      IllegalStateException - if no input format has been defined.
    • batchFinished

      public boolean batchFinished()
      Signifies that this batch of input to the filter is finished. If the filter requires all instances prior to filtering, output() may now be called to retrieve the filtered instances.
      Overrides:
      batchFinished in class Filter
      Returns:
      true if there are instances pending output
      Throws:
      IllegalStateException - if no input structure has been defined
    • globalInfo

      public String globalInfo()
      Returns a string describing this filter
      Returns:
      a description of the filter suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      Specified by:
      getTechnicalInformation in interface TechnicalInformationHandler
      Returns:
      the technical information about this class
    • spreadAttributeWeightTipText

      public String spreadAttributeWeightTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setSpreadAttributeWeight

      public void setSpreadAttributeWeight(boolean p)
      If true, when generating binary attributes, spread weight of old attribute across new attributes. Do not give each new attribute the old weight.
      Parameters:
      p - whether weight is spread
    • getSpreadAttributeWeight

      public boolean getSpreadAttributeWeight()
      If true, when generating binary attributes, spread weight of old attribute across new attributes. Do not give each new attribute the old weight.
      Returns:
      whether weight is spread
    • binRangePrecisionTipText

      public String binRangePrecisionTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setBinRangePrecision

      public void setBinRangePrecision(int p)
      Set the precision for bin boundaries. Only affects the boundary values used in the labels for the converted attributes; internal cutpoints are at full double precision.
      Parameters:
      p - the precision for bin boundaries
    • getBinRangePrecision

      public int getBinRangePrecision()
      Get the precision for bin boundaries. Only affects the boundary values used in the labels for the converted attributes; internal cutpoints are at full double precision.
      Returns:
      the precision for bin boundaries
    • makeBinaryTipText

      public String makeBinaryTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getMakeBinary

      public boolean getMakeBinary()
      Gets whether binary attributes should be made for discretized ones.
      Returns:
      true if attributes will be binarized
    • setMakeBinary

      public void setMakeBinary(boolean makeBinary)
      Sets whether binary attributes should be made for discretized ones.
      Parameters:
      makeBinary - if binary attributes are to be made
    • useBinNumbersTipText

      public String useBinNumbersTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getUseBinNumbers

      public boolean getUseBinNumbers()
      Gets whether bin numbers rather than ranges should be used for discretized attributes.
      Returns:
      true if bin numbers should be used
    • setUseBinNumbers

      public void setUseBinNumbers(boolean useBinNumbers)
      Sets whether bin numbers rather than ranges should be used for discretized attributes.
      Parameters:
      useBinNumbers - if bin numbers should be used
    • useKononenkoTipText

      public String useKononenkoTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getUseKononenko

      public boolean getUseKononenko()
      Gets whether Kononenko's MDL criterion is to be used.
      Returns:
      true if Kononenko's criterion will be used.
    • setUseKononenko

      public void setUseKononenko(boolean useKon)
      Sets whether Kononenko's MDL criterion is to be used.
      Parameters:
      useKon - true if Kononenko's one is to be used
    • useBetterEncodingTipText

      public String useBetterEncodingTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getUseBetterEncoding

      public boolean getUseBetterEncoding()
      Gets whether better encoding is to be used for MDL.
      Returns:
      true if the better MDL encoding will be used
    • setUseBetterEncoding

      public void setUseBetterEncoding(boolean useBetterEncoding)
      Sets whether better encoding is to be used for MDL.
      Parameters:
      useBetterEncoding - true if better encoding to be used.
    • invertSelectionTipText

      public String invertSelectionTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getInvertSelection

      public boolean getInvertSelection()
      Gets whether the supplied columns are to be removed or kept
      Returns:
      true if the supplied columns will be kept
    • setInvertSelection

      public void setInvertSelection(boolean invert)
      Sets whether selected columns should be removed or kept. If true the selected columns are kept and unselected columns are deleted. If false selected columns are deleted and unselected columns are kept.
      Parameters:
      invert - the new invert setting
    • attributeIndicesTipText

      public String attributeIndicesTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getAttributeIndices

      public String getAttributeIndices()
      Gets the current range selection
      Returns:
      a string containing a comma separated list of ranges
    • setAttributeIndices

      public void setAttributeIndices(String rangeList)
      Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).
      Parameters:
      rangeList - a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
      eg: first-3,5,6-last
      Throws:
      IllegalArgumentException - if an invalid range list is supplied
    • setAttributeIndicesArray

      public void setAttributeIndicesArray(int[] attributes)
      Sets which attributes are to be Discretized (only numeric attributes among the selection will be Discretized).
      Parameters:
      attributes - an array containing indexes of attributes to Discretize. Since the array will typically come from a program, attributes are indexed from 0.
      Throws:
      IllegalArgumentException - if an invalid set of ranges is supplied
    • getCutPoints

      public double[] getCutPoints(int attributeIndex)
      Gets the cut points for an attribute
      Parameters:
      attributeIndex - the index (from 0) of the attribute to get the cut points of
      Returns:
      an array containing the cutpoints (or null if the attribute requested isn't being Discretized
    • getBinRangesString

      public String getBinRangesString(int attributeIndex)
      Gets the bin ranges string for an attribute
      Parameters:
      attributeIndex - the index (from 0) of the attribute to get the bin ranges string of
      Returns:
      the bin ranges string (or null if the attribute requested has been discretized into only one interval.)
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class Filter
      Returns:
      the revision
    • main

      public static void main(String[] argv)
      Main method for testing this class.
      Parameters:
      argv - should contain arguments to the filter: use -h for help