Class MergeNominalValues

All Implemented Interfaces:
Serializable, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, RevisionHandler, TechnicalInformationHandler, WeightedAttributesHandler, WeightedInstancesHandler, SupervisedFilter

Merges values of all nominal attributes among the specified attributes, excluding the class attribute, using the CHAID method, but without considering re-splitting of merged subsets. It implements Steps 1 and 2 described by Kass (1980), see

Gordon V. Kass (1980). An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics. 29(2):119-127.

Once attribute values have been merged, a chi-squared test using the Bonferroni correction is applied to check if the resulting attribute is a valid predictor, based on the Bonferroni multiplier in Equation 3.2 in Kass (1980). If an attribute does not pass this test, all remaining values (if any) are merged. Nevertheless, useless predictors can slip through without being fully merged, e.g. identifier attributes.

The code applies the Yates correction when the chi-squared statistic is computed.

Note that the algorithm is quadratic in the number of attribute values for an attribute.

Valid options are:

 -D
  Turns on output of debugging information.
 
 -L <double>
  The significance level (default: 0.05).
 
 -R <range>
  Sets list of attributes to act on (or its inverse). 'first and 'last' are accepted as well.'
  E.g.: first-5,7,9,20-last
  (default: first-last)
 
 -V
  Invert matching sense (i.e. act on all attributes not specified in list)
 
 -O
  Use short identifiers for merged subsets.
 
Version:
$Revision: 14508 $
Author:
Eibe Frank
See Also:
  • Constructor Details

    • MergeNominalValues

      public MergeNominalValues()
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this filter.
      Specified by:
      globalInfo in class SimpleFilter
      Returns:
      a description of the filter suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      Specified by:
      getTechnicalInformation in interface TechnicalInformationHandler
      Returns:
      the technical information about this class
    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class Filter
      Returns:
      an enumeration of all the available options.
    • getOptions

      public String[] getOptions()
      Gets the current settings of the filter.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class Filter
      Returns:
      an array of strings suitable for passing to setOptions
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -D
        Turns on output of debugging information.
       
       -L <double>
        The significance level (default: 0.05).
       
       -R <range>
        Sets list of attributes to act on (or its inverse). 'first and 'last' are accepted as well.'
        E.g.: first-5,7,9,20-last
        (default: first-last)
       
       -V
        Invert matching sense (i.e. act on all attributes not specified in list)
       
       -O
        Use short identifiers for merged subsets.
       
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class Filter
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • significanceLevelTipText

      public String significanceLevelTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getSignificanceLevel

      public double getSignificanceLevel()
      Gets the significance level.
      Returns:
      int the significance level.
    • setSignificanceLevel

      public void setSignificanceLevel(double sF)
      Sets the significance level.
      Parameters:
      sF - the significance level as a double.
    • attributeIndicesTipText

      public String attributeIndicesTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getAttributeIndices

      public String getAttributeIndices()
      Get the current range selection.
      Returns:
      a string containing a comma separated list of ranges
    • setAttributeIndices

      public void setAttributeIndices(String rangeList)
      Set which attributes are to be acted on (or not, if invert is true)
      Parameters:
      rangeList - a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
      eg: first-3,5,6-last
    • setAttributeIndicesArray

      public void setAttributeIndicesArray(int[] attributes)
      Set which attributes are to be acted on (or not, if invert is true)
      Parameters:
      attributes - an array containing indexes of attributes to select. Since the array will typically come from a program, attributes are indexed from 0.
    • invertSelectionTipText

      public String invertSelectionTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getInvertSelection

      public boolean getInvertSelection()
      Get whether the supplied attributes are to be acted on or all other attributes.
      Returns:
      true if the supplied attributes will be kept
    • setInvertSelection

      public void setInvertSelection(boolean invert)
      Set whether selected attributes should be acted on or all other attributes.
      Parameters:
      invert - the new invert setting
    • useShortIdentifiersTipText

      public String useShortIdentifiersTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getUseShortIdentifiers

      public boolean getUseShortIdentifiers()
      Get whether short identifiers are to be output.
      Returns:
      true if short IDs are output
    • setUseShortIdentifiers

      public void setUseShortIdentifiers(boolean b)
      Set whether to output short identifiers for merged values.
      Parameters:
      b - if true, short IDs are output
    • allowAccessToFullInputFormat

      public boolean allowAccessToFullInputFormat()
      We need access to the full input data in determineOutputFormat.
      Overrides:
      allowAccessToFullInputFormat in class SimpleBatchFilter
      Returns:
      whether determineOutputFormat has access to the full input dataset
    • getCapabilities

      public Capabilities getCapabilities()
      Returns the Capabilities of this filter.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Overrides:
      getCapabilities in class Filter
      Returns:
      the capabilities of this object
      See Also:
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class Filter
      Returns:
      the revision
    • main

      public static void main(String[] args)
      runs the filter with the given arguments
      Parameters:
      args - the commandline arguments