Class PrincipalComponents

java.lang.Object
weka.filters.Filter
weka.filters.unsupervised.attribute.PrincipalComponents
All Implemented Interfaces:
Serializable, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, RevisionHandler, UnsupervisedFilter

public class PrincipalComponents extends Filter implements OptionHandler, UnsupervisedFilter
Performs a principal components analysis and transformation of the data.
Dimensionality reduction is accomplished by choosing enough eigenvectors to account for some percentage of the variance in the original data -- default 0.95 (95%).
Based on code of the attribute selection scheme 'PrincipalComponents' by Mark Hall and Gabi Schmidberger.

Valid options are:

 -C
  Center (rather than standardize) the
  data and compute PCA using the covariance (rather
   than the correlation) matrix.
 
 -R <num>
  Retain enough PC attributes to account
  for this proportion of variance in the original data.
  (default: 0.95)
 
 -A <num>
  Maximum number of attributes to include in 
  transformed attribute names.
  (-1 = include all, default: 5)
 
 -M <num>
  Maximum number of PC attributes to retain.
  (-1 = include all, default: -1)
 
Version:
$Revision: 12660 $
Author:
Mark Hall (mhall@cs.waikato.ac.nz) -- attribute selection code, Gabi Schmidberger (gabi@cs.waikato.ac.nz) -- attribute selection code, fracpete (fracpete at waikato dot ac dot nz) -- filter code
See Also:
  • Constructor Details

    • PrincipalComponents

      public PrincipalComponents()
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this filter.
      Returns:
      a description of the filter suitable for displaying in the explorer/experimenter gui
    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class Filter
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a list of options for this object.

      Valid options are:

       -C
        Center (rather than standardize) the
        data and compute PCA using the covariance (rather
         than the correlation) matrix.
       
       -R <num>
        Retain enough PC attributes to account
        for this proportion of variance in the original data.
        (default: 0.95)
       
       -A <num>
        Maximum number of attributes to include in 
        transformed attribute names.
        (-1 = include all, default: 5)
       
       -M <num>
        Maximum number of PC attributes to retain.
        (-1 = include all, default: -1)
       
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class Filter
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the filter.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class Filter
      Returns:
      an array of strings suitable for passing to setOptions
    • centerDataTipText

      public String centerDataTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setCenterData

      public void setCenterData(boolean center)
      Set whether to center (rather than standardize) the data. If set to true then PCA is computed from the covariance rather than correlation matrix.
      Parameters:
      center - true if the data is to be centered rather than standardized
    • getCenterData

      public boolean getCenterData()
      Get whether to center (rather than standardize) the data. If true then PCA is computed from the covariance rather than correlation matrix.
      Returns:
      true if the data is to be centered rather than standardized.
    • varianceCoveredTipText

      public String varianceCoveredTipText()
      Returns the tip text for this property.
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setVarianceCovered

      public void setVarianceCovered(double value)
      Sets the amount of variance to account for when retaining principal components.
      Parameters:
      value - the proportion of total variance to account for
    • getVarianceCovered

      public double getVarianceCovered()
      Gets the proportion of total variance to account for when retaining principal components.
      Returns:
      the proportion of variance to account for
    • maximumAttributeNamesTipText

      public String maximumAttributeNamesTipText()
      Returns the tip text for this property.
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setMaximumAttributeNames

      public void setMaximumAttributeNames(int value)
      Sets maximum number of attributes to include in transformed attribute names.
      Parameters:
      value - the maximum number of attributes
    • getMaximumAttributeNames

      public int getMaximumAttributeNames()
      Gets maximum number of attributes to include in transformed attribute names.
      Returns:
      the maximum number of attributes
    • maximumAttributesTipText

      public String maximumAttributesTipText()
      Returns the tip text for this property.
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setMaximumAttributes

      public void setMaximumAttributes(int value)
      Sets maximum number of PC attributes to retain.
      Parameters:
      value - the maximum number of attributes
    • getMaximumAttributes

      public int getMaximumAttributes()
      Gets maximum number of PC attributes to retain.
      Returns:
      the maximum number of attributes
    • getCapabilities

      public Capabilities getCapabilities()
      Returns the capabilities of this evaluator.
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Overrides:
      getCapabilities in class Filter
      Returns:
      the capabilities of this evaluator
      See Also:
    • setInputFormat

      public boolean setInputFormat(Instances instanceInfo) throws Exception
      Sets the format of the input instances.
      Overrides:
      setInputFormat in class Filter
      Parameters:
      instanceInfo - an Instances object containing the input instance structure (any instances contained in the object are ignored - only the structure is required).
      Returns:
      true if the outputFormat may be collected immediately
      Throws:
      Exception - if the input format can't be set successfully
    • input

      public boolean input(Instance instance) throws Exception
      Input an instance for filtering. Filter requires all training instances be read before producing output.
      Overrides:
      input in class Filter
      Parameters:
      instance - the input instance
      Returns:
      true if the filtered instance may now be collected with output().
      Throws:
      IllegalStateException - if no input format has been set
      Exception - if conversion fails
    • batchFinished

      public boolean batchFinished() throws Exception
      Signify that this batch of input to the filter is finished.
      Overrides:
      batchFinished in class Filter
      Returns:
      true if there are instances pending output
      Throws:
      NullPointerException - if no input structure has been defined,
      Exception - if there was a problem finishing the batch.
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class Filter
      Returns:
      the revision
    • main

      public static void main(String[] args)
      Main method for running this filter.
      Parameters:
      args - should contain arguments to the filter: use -h for help