Class SubsetByExpression

All Implemented Interfaces:
Serializable, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, RevisionHandler, WeightedAttributesHandler, WeightedInstancesHandler

public class SubsetByExpression extends SimpleBatchFilter implements WeightedInstancesHandler, WeightedAttributesHandler
Filters instances according to a user-specified expression.

Examples:
- extracting only mammals and birds from the 'zoo' UCI dataset:
(CLASS is 'mammal') or (CLASS is 'bird')
- extracting only animals with at least 2 legs from the 'zoo' UCI dataset:
(ATT14 >= 2)
- extracting only instances with non-missing 'wage-increase-second-year'
from the 'labor' UCI dataset:
not ismissing(ATT3)

Valid options are:

 -E <expr>
  The expression to use for filtering
  (default: true).
 -F
  Apply the filter to instances that arrive after the first
  (training) batch. The default is to not apply the filter (i.e.,
  always return the instance)
 -output-debug-info
  If set, filter is run in debug mode and
  may output additional info to the console
 -do-not-check-capabilities
  If set, filter capabilities are not checked when input format is set
  (use with caution).
Version:
$Revision: 14508 $
Author:
fracpete (fracpete at waikato dot ac dot nz)
See Also:
  • Constructor Details

    • SubsetByExpression

      public SubsetByExpression()
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this filter.
      Specified by:
      globalInfo in class SimpleFilter
      Returns:
      a description of the filter suitable for displaying in the explorer/experimenter gui
    • mayRemoveInstanceAfterFirstBatchDone

      public boolean mayRemoveInstanceAfterFirstBatchDone()
      SubsetByExpression may return false from input() (thus not making an instance available immediately) even after the first batch has been completed if the user has opted to apply the filter to instances after the first batch (rather than just passing them through).
      Overrides:
      mayRemoveInstanceAfterFirstBatchDone in class Filter
      Returns:
      true this filter may remove (consume) input instances after the first batch has been completed.
    • input

      public boolean input(Instance instance) throws Exception
      Input an instance for filtering. Filter requires all training instances be read before producing output (calling the method batchFinished() makes the data available). If this instance is part of a new batch, m_NewBatch is set to false.
      Overrides:
      input in class SimpleBatchFilter
      Parameters:
      instance - the input instance
      Returns:
      true if the filtered instance may now be collected with output().
      Throws:
      IllegalStateException - if no input structure has been defined
      Exception - if something goes wrong
      See Also:
    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class Filter
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -E <expr>
        The expression to use for filtering
        (default: true).
       -F
        Apply the filter to instances that arrive after the first
        (training) batch. The default is to not apply the filter (i.e.,
        always return the instance)
       -output-debug-info
        If set, filter is run in debug mode and
        may output additional info to the console
       -do-not-check-capabilities
        If set, filter capabilities are not checked when input format is set
        (use with caution).
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class Filter
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the filter.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class Filter
      Returns:
      an array of strings suitable for passing to setOptions
    • getCapabilities

      public Capabilities getCapabilities()
      Returns the Capabilities of this filter.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Overrides:
      getCapabilities in class Filter
      Returns:
      the capabilities of this object
      See Also:
    • setExpression

      public void setExpression(String value)
      Sets the expression used for filtering.
      Parameters:
      value - the expression
    • getExpression

      public String getExpression()
      Returns the expression used for filtering.
      Returns:
      the expression
    • expressionTipText

      public String expressionTipText()
      Returns the tip text for this property.
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setFilterAfterFirstBatch

      public void setFilterAfterFirstBatch(boolean b)
      Set whether to apply the filter to instances that arrive once the first (training) batch has been seen. The default is to not apply the filter and just return each instance input. This is so that, when used in the FilteredClassifier, a test instance does not get "consumed" by the filter and a prediction is always generated.
      Parameters:
      b - true if the filter should be applied to instances that arrive after the first (training) batch has been processed.
    • getFilterAfterFirstBatch

      public boolean getFilterAfterFirstBatch()
      Get whether to apply the filter to instances that arrive once the first (training) batch has been seen. The default is to not apply the filter and just return each instance input. This is so that, when used in the FilteredClassifier, a test instance does not get "consumed" by the filter and a prediction is always generated.
      Returns:
      true if the filter should be applied to instances that arrive after the first (training) batch has been processed.
    • filterAfterFirstBatchTipText

      public String filterAfterFirstBatchTipText()
      Returns the tip text for this property.
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class Filter
      Returns:
      the revision
    • main

      public static void main(String[] args)
      Main method for running this filter.
      Parameters:
      args - arguments for the filter: use -h for help