Package weka.filters

Class Filter

java.lang.Object
weka.filters.Filter
All Implemented Interfaces:
Serializable, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, RevisionHandler
Direct Known Subclasses:
AbstractTimeSeries, Add, AddCluster, AddExpression, AddID, AddNoise, AddUserFields, AddValues, AllFilter, AttributeSelection, ChangeDateFormat, ClassOrder, ClusterMembership, Copy, Discretize, FirstOrder, MakeIndicator, MergeTwoValues, NominalToBinary, NominalToBinary, NominalToString, NonSparseToSparse, NumericTransform, Obfuscate, PartitionMembership, PotentialClassIgnorer, PrincipalComponents, Randomize, RandomProjection, Remove, RemoveFolds, RemoveFrequentValues, RemoveMisclassified, RemovePercentage, RemoveRange, RemoveType, RemoveUseless, RemoveWithValues, RenameNominalValues, RenameRelation, Reorder, Resample, Resample, ReservoirSample, SimpleFilter, SparseToNonSparse, SpreadSubsample, StratifiedRemoveFolds, StringToNominal, StringToWordVector, SwapValues

An abstract class for instance filters: objects that take instances as input, carry out some transformation on the instance and then output the instance. The method implementations in this class assume that most of the work will be done in the methods overridden by subclasses.

A simple example of filter use. This example doesn't remove instances from the output queue until all instances have been input, so has higher memory consumption than an approach that uses output instances as they are made available:

  Filter filter = ..some type of filter..
  Instances instances = ..some instances..
  for (int i = 0; i < data.numInstances(); i++) {
    filter.input(data.instance(i));
  }
  filter.batchFinished();
  Instances newData = filter.outputFormat();
  Instance processed;
  while ((processed = filter.output()) != null) {
    newData.add(processed);
  }
  ..do something with newData..
 
Version:
$Revision: 14804 $
Author:
Len Trigg (trigg@cs.waikato.ac.nz)
See Also:
  • Constructor Details

    • Filter

      public Filter()
  • Method Details

    • isNewBatch

      public boolean isNewBatch()
      Returns true if the a new batch was started, either a new instance of the filter was created or the batchFinished() method got called.
      Returns:
      true if a new batch has been initiated
      See Also:
    • isFirstBatchDone

      public boolean isFirstBatchDone()
      Returns true if the first batch of instances got processed. Necessary for supervised filters, which "learn" from the first batch and then shouldn't get updated with subsequent calls of batchFinished().
      Returns:
      true if the first batch has been processed
      See Also:
    • mayRemoveInstanceAfterFirstBatchDone

      public boolean mayRemoveInstanceAfterFirstBatchDone()
      Default implementation returns false. Some filters may not necessarily be able to produce an instance for output for every instance input after the first batch has been completed - such filters should override this method and return true.
      Returns:
      false by default
    • getCapabilities

      public Capabilities getCapabilities()
      Returns the Capabilities of this filter. Derived filters have to override this method to enable capabilities.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Returns:
      the capabilities of this object
      See Also:
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Returns:
      the revision
    • getCapabilities

      public Capabilities getCapabilities(Instances data)
      Returns the Capabilities of this filter, customized based on the data. I.e., if removes all class capabilities, in case there's not class attribute present or removes the NO_CLASS capability, in case that there's a class present.
      Parameters:
      data - the data to use for customization
      Returns:
      the capabilities of this object, based on the data
      See Also:
    • getCopyOfInputFormat

      public Instances getCopyOfInputFormat()
      Gets a copy of just the structure of the input format instances.
      Returns:
      a copy of the structure (attribute information) of the input format instances
    • setInputFormat

      public boolean setInputFormat(Instances instanceInfo) throws Exception
      Sets the format of the input instances. If the filter is able to determine the output format before seeing any input instances, it does so here. This default implementation clears the output format and output queue, and the new batch flag is set. Overriders should call super.setInputFormat(Instances)
      Parameters:
      instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
      Returns:
      true if the outputFormat may be collected immediately
      Throws:
      Exception - if the inputFormat can't be set successfully
    • getOutputFormat

      public Instances getOutputFormat()
      Gets the format of the output instances. This should only be called after input() or batchFinished() has returned true. The relation name of the output instances should be changed to reflect the action of the filter (eg: add the filter name and options).
      Returns:
      an Instances object containing the output instance structure only.
      Throws:
      NullPointerException - if no input structure has been defined (or the output format hasn't been determined yet)
    • input

      public boolean input(Instance instance) throws Exception
      Input an instance for filtering. Ordinarily the instance is processed and made available for output immediately. Some filters require all instances be read before producing output, in which case output instances should be collected after calling batchFinished(). If the input marks the start of a new batch, the output queue is cleared. This default implementation assumes all instance conversion will occur when batchFinished() is called.
      Parameters:
      instance - the input instance
      Returns:
      true if the filtered instance may now be collected with output().
      Throws:
      NullPointerException - if the input format has not been defined.
      Exception - if the input instance was not of the correct format or if there was a problem with the filtering.
    • batchFinished

      public boolean batchFinished() throws Exception
      Signify that this batch of input to the filter is finished. If the filter requires all instances prior to filtering, output() may now be called to retrieve the filtered instances. Any subsequent instances filtered should be filtered based on setting obtained from the first batch (unless the inputFormat has been re-assigned or new options have been set). This default implementation assumes all instance processing occurs during inputFormat() and input().
      Returns:
      true if there are instances pending output
      Throws:
      NullPointerException - if no input structure has been defined,
      Exception - if there was a problem finishing the batch.
    • output

      public Instance output()
      Output an instance after filtering and remove from the output queue.
      Returns:
      the instance that has most recently been filtered (or null if the queue is empty).
      Throws:
      NullPointerException - if no output structure has been defined
    • outputPeek

      public Instance outputPeek()
      Output an instance after filtering but do not remove from the output queue.
      Returns:
      the instance that has most recently been filtered (or null if the queue is empty).
      Throws:
      NullPointerException - if no input structure has been defined
    • numPendingOutput

      public int numPendingOutput()
      Returns the number of instances pending output
      Returns:
      the number of instances pending output
      Throws:
      NullPointerException - if no input structure has been defined
    • isOutputFormatDefined

      public boolean isOutputFormatDefined()
      Returns whether the output format is ready to be collected
      Returns:
      true if the output format is set
    • makeCopy

      public static Filter makeCopy(Filter model) throws Exception
      Creates a deep copy of the given filter using serialization.
      Parameters:
      model - the filter to copy
      Returns:
      a deep copy of the filter
      Throws:
      Exception - if an error occurs
    • makeCopies

      public static Filter[] makeCopies(Filter model, int num) throws Exception
      Creates a given number of deep copies of the given filter using serialization.
      Parameters:
      model - the filter to copy
      num - the number of filter copies to create.
      Returns:
      an array of filters.
      Throws:
      Exception - if an error occurs
    • useFilter

      public static Instances useFilter(Instances data, Filter filter) throws Exception
      Filters an entire set of instances through a filter and returns the new set.
      Parameters:
      data - the data to be filtered
      filter - the filter to be used
      Returns:
      the filtered set of data
      Throws:
      Exception - if the filter can't be used successfully
    • toString

      public String toString()
      Returns a description of the filter, by default only the classname.
      Overrides:
      toString in class Object
      Returns:
      a string describing the filter
    • wekaStaticWrapper

      public static String wekaStaticWrapper(Sourcable filter, String className, Instances input, Instances output) throws Exception
      generates source code from the filter
      Parameters:
      filter - the filter to output as source
      className - the name of the generated class
      input - the input data the header is generated for
      output - the output data the header is generated for
      Returns:
      the generated source code
      Throws:
      Exception - if source code cannot be generated
    • filterFile

      public static void filterFile(Filter filter, String[] options) throws Exception
      Method for testing filters.
      Parameters:
      filter - the filter to use
      options - should contain the following arguments:
      -i input_file
      -o output_file
      -c class_index
      -z classname (for filters implementing weka.filters.Sourcable)
      -decimal num (the number of decimal places to use in the output; default = 6)
      or -h for help on options
      Throws:
      Exception - if something goes wrong or the user requests help on command options
    • batchFilterFile

      public static void batchFilterFile(Filter filter, String[] options) throws Exception
      Method for testing filters ability to process multiple batches.
      Parameters:
      filter - the filter to use
      options - should contain the following arguments:
      -i (first) input file
      -o (first) output file
      -r (second) input file
      -s (second) output file
      -c class_index
      -z classname (for filters implementing weka.filters.Sourcable)
      -decimal num (the number of decimal places to use in the output; default = 6)
      or -h for help on options
      Throws:
      Exception - if something goes wrong or the user requests help on command options
    • runFilter

      public static void runFilter(Filter filter, String[] options)
      runs the filter instance with the given options.
      Parameters:
      filter - the filter to run
      options - the commandline options
    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options. Valid options are:

      -D
      If set, filter is run in debug mode and may output additional info to the console.

      -do-not-check-capabilities
      If set, filter capabilities are not checked before filter is built (use with caution).

      Specified by:
      setOptions in interface OptionHandler
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the filter.
      Specified by:
      getOptions in interface OptionHandler
      Returns:
      an array of strings suitable for passing to setOptions
    • setDebug

      public void setDebug(boolean debug)
      Set debugging mode.
      Parameters:
      debug - true if debug output should be printed
    • getDebug

      public boolean getDebug()
      Get whether debugging is turned on.
      Returns:
      true if debugging output is on
    • debugTipText

      public String debugTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setDoNotCheckCapabilities

      public void setDoNotCheckCapabilities(boolean doNotCheckCapabilities)
      Set whether not to check capabilities.
      Specified by:
      setDoNotCheckCapabilities in interface CapabilitiesIgnorer
      Parameters:
      doNotCheckCapabilities - true if capabilities are not to be checked.
    • getDoNotCheckCapabilities

      public boolean getDoNotCheckCapabilities()
      Get whether capabilities checking is turned off.
      Specified by:
      getDoNotCheckCapabilities in interface CapabilitiesIgnorer
      Returns:
      true if capabilities checking is turned off.
    • doNotCheckCapabilitiesTipText

      public String doNotCheckCapabilitiesTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • preExecution

      public void preExecution() throws Exception
      Perform any setup stuff that might need to happen before commandline execution. Subclasses should override if they need to do something here
      Specified by:
      preExecution in interface CommandlineRunnable
      Throws:
      Exception - if a problem occurs during setup
    • run

      public void run(Object toRun, String[] options) throws Exception
      Execute the supplied object.
      Specified by:
      run in interface CommandlineRunnable
      Parameters:
      toRun - the object to execute
      options - any options to pass to the object
      Throws:
      Exception - if the object is not of the expected type.
    • postExecution

      public void postExecution() throws Exception
      Perform any teardown stuff that might need to happen after execution. Subclasses should override if they need to do something here
      Specified by:
      postExecution in interface CommandlineRunnable
      Throws:
      Exception - if a problem occurs during teardown
    • main

      public static void main(String[] args)
      Main method for testing this class.
      Parameters:
      args - should contain arguments to the filter: use -h for help