Class RemoveFrequentValues

java.lang.Object
weka.filters.Filter
weka.filters.unsupervised.instance.RemoveFrequentValues
All Implemented Interfaces:
Serializable, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, RevisionHandler, WeightedAttributesHandler, UnsupervisedFilter

public class RemoveFrequentValues extends Filter implements OptionHandler, UnsupervisedFilter, WeightedAttributesHandler
Determines which values (frequent or infrequent ones) of an (nominal) attribute are retained and filters the instances accordingly. In case of values with the same frequency, they are kept in the way they appear in the original instances object. E.g. if you have the values "1,2,3,4" with the frequencies "10,5,5,3" and you chose to keep the 2 most common values, the values "1,2" would be returned, since the value "2" comes before "3", even though they have the same frequency.

Valid options are:

 -C <num>
  Choose attribute to be used for selection.
 
 -N <num>
  Number of values to retain for the specified attribute,
  i.e. the ones with the most instances (default 2).
 
 -L
  Instead of values with the most instances the ones with the 
  least are retained.
 
 -H
  When selecting on nominal attributes, removes header
  references to excluded values.
 
 -V
  Invert matching sense.
 
Version:
$Revision: 14508 $
Author:
FracPete (fracpete at waikato dot ac dot nz)
See Also:
  • Constructor Details

    • RemoveFrequentValues

      public RemoveFrequentValues()
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this filter
      Returns:
      a description of the classifier suitable for displaying in the explorer/experimenter gui
    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class Filter
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -C <num>
        Choose attribute to be used for selection.
       
       -N <num>
        Number of values to retain for the sepcified attribute, 
        i.e. the ones with the most instances (default 2).
       
       -L
        Instead of values with the most instances the ones with the 
        least are retained.
       
       -H
        When selecting on nominal attributes, removes header
        references to excluded values.
       
       -V
        Invert matching sense.
       
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class Filter
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the filter.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class Filter
      Returns:
      an array of strings suitable for passing to setOptions
    • attributeIndexTipText

      public String attributeIndexTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getAttributeIndex

      public String getAttributeIndex()
      Get the index of the attribute used.
      Returns:
      the index of the attribute
    • setAttributeIndex

      public void setAttributeIndex(String attIndex)
      Sets index of the attribute used.
      Parameters:
      attIndex - the index of the attribute
    • numValuesTipText

      public String numValuesTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getNumValues

      public int getNumValues()
      Gets how many values are retained
      Returns:
      how many values are retained
    • setNumValues

      public void setNumValues(int numValues)
      Sets how many values are retained
      Parameters:
      numValues - the number of values to retain
    • useLeastValuesTipText

      public String useLeastValuesTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getUseLeastValues

      public boolean getUseLeastValues()
      Gets whether to use values with least or most instances
      Returns:
      true if values with least instances are retained
    • setUseLeastValues

      public void setUseLeastValues(boolean leastValues)
      Sets whether to use values with least or most instances
      Parameters:
      leastValues - whether values with least or most instances are retained
    • modifyHeaderTipText

      public String modifyHeaderTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getModifyHeader

      public boolean getModifyHeader()
      Gets whether the header will be modified when selecting on nominal attributes.
      Returns:
      true if so.
    • setModifyHeader

      public void setModifyHeader(boolean newModifyHeader)
      Sets whether the header will be modified when selecting on nominal attributes.
      Parameters:
      newModifyHeader - true if so.
    • invertSelectionTipText

      public String invertSelectionTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getInvertSelection

      public boolean getInvertSelection()
      Get whether the supplied columns are to be removed or kept
      Returns:
      true if the supplied columns will be kept
    • setInvertSelection

      public void setInvertSelection(boolean invert)
      Set whether selected values should be removed or kept. If true the selected values are kept and unselected values are deleted.
      Parameters:
      invert - the new invert setting
    • isNominal

      public boolean isNominal()
      Returns true if selection attribute is nominal.
      Returns:
      true if selection attribute is nominal
    • determineValues

      public void determineValues(Instances inst)
      determines the values to retain, it is always at least 1 and up to the maximum number of distinct values
      Parameters:
      inst - the Instances to determine the values from which are kept
    • getCapabilities

      public Capabilities getCapabilities()
      Returns the Capabilities of this filter.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Overrides:
      getCapabilities in class Filter
      Returns:
      the capabilities of this object
      See Also:
    • setInputFormat

      public boolean setInputFormat(Instances instanceInfo) throws Exception
      Sets the format of the input instances.
      Overrides:
      setInputFormat in class Filter
      Parameters:
      instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
      Returns:
      true if the outputFormat can be collected immediately
      Throws:
      UnsupportedAttributeTypeException - if the specified attribute is not nominal.
      Exception - if the inputFormat can't be set successfully
    • input

      public boolean input(Instance instance)
      Input an instance for filtering. Ordinarily the instance is processed and made available for output immediately. Some filters require all instances be read before producing output.
      Overrides:
      input in class Filter
      Parameters:
      instance - the input instance
      Returns:
      true if the filtered instance may now be collected with output().
      Throws:
      IllegalStateException - if no input format has been set.
    • batchFinished

      public boolean batchFinished()
      Signifies that this batch of input to the filter is finished. If the filter requires all instances prior to filtering, output() may now be called to retrieve the filtered instances.
      Overrides:
      batchFinished in class Filter
      Returns:
      true if there are instances pending output
      Throws:
      IllegalStateException - if no input structure has been defined
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class Filter
      Returns:
      the revision
    • main

      public static void main(String[] argv)
      Main method for testing this class.
      Parameters:
      argv - should contain arguments to the filter: use -h for help