Class Sorter

java.lang.Object
weka.knowledgeflow.steps.BaseStep
weka.knowledgeflow.steps.Sorter
All Implemented Interfaces:
Serializable, BaseStepExtender, Step

@KFStep(name="Sorter", category="Tools", toolTipText="Sort instances in ascending or descending order according to the values of user-specified attributes. Instances can be sorted according to multiple attributes (defined in order). Handles datasets larger than can be fit into main memory via instance connections and specifying the in-memory buffer size. Implements a merge-sort by writing the sorted in-memory buffer to a file when full and then interleaving instances from the disk-based file(s) when the incoming stream has finished.", iconPath="weka/gui/knowledgeflow/icons/Sorter.gif") public class Sorter extends BaseStep
Step for sorting instances according to one or more attributes.
Version:
$Revision: $
Author:
Mark Hall (mhall{[at]}pentaho{[dot]}com)
See Also:
  • Constructor Details

    • Sorter

      public Sorter()
  • Method Details

    • getBufferSize

      public String getBufferSize()
      Get the size of the in-memory buffer
      Returns:
      the size of the in-memory buffer
    • setBufferSize

      @OptionMetadata(displayName="Size of in-mem streaming buffer", description="Number of instances to sort in memory before writing to a temp file (instance connections only)", displayOrder=1) public void setBufferSize(String buffSize)
      Set the size of the in-memory buffer
      Parameters:
      buffSize - the size of the in-memory buffer
    • setTempDirectory

      @FilePropertyMetadata(fileChooserDialogType=0, directoriesOnly=true) @OptionMetadata(displayName="Directory for temp files", description="Where to store temporary files when spilling to disk", displayOrder=2) public void setTempDirectory(File tempDir)
      Set the directory to use for temporary files during incremental operation
      Parameters:
      tempDir - the temp dir to use
    • getTempDirectory

      public File getTempDirectory()
      Get the directory to use for temporary files during incremental operation
      Returns:
      the temp dir to use
    • setSortDetails

      @ProgrammaticProperty public void setSortDetails(String sortDetails)
      Set the sort rules to use
      Parameters:
      sortDetails - the sort rules in internal string representation
    • getSortDetails

      public String getSortDetails()
      Get the sort rules to use
      Returns:
      the sort rules in internal string representation
    • stepInit

      public void stepInit() throws WekaException
      Initialize the step.
      Throws:
      WekaException - if a problem occurs during initialization
    • getIncomingConnectionTypes

      public List<String> getIncomingConnectionTypes()
      Get a list of incoming connection types that this step can accept. Ideally (and if appropriate), this should take into account the state of the step and any existing incoming connections. E.g. a step might be able to accept one (and only one) incoming batch data connection.
      Returns:
      a list of incoming connections that this step can accept given its current state
    • getOutgoingConnectionTypes

      public List<String> getOutgoingConnectionTypes()
      Get a list of outgoing connection types that this step can produce. Ideally (and if appropriate), this should take into account the state of the step and the incoming connections. E.g. depending on what incoming connection is present, a step might be able to produce a trainingSet output, a testSet output or neither, but not both.
      Returns:
      a list of outgoing connections that this step can produce
    • processIncoming

      public void processIncoming(Data data) throws WekaException
      Process an incoming data payload (if the step accepts incoming connections)
      Specified by:
      processIncoming in interface BaseStepExtender
      Specified by:
      processIncoming in interface Step
      Overrides:
      processIncoming in class BaseStep
      Parameters:
      data - the data to process
      Throws:
      WekaException - if a problem occurs
    • getCustomEditorForStep

      public String getCustomEditorForStep()
      Return the fully qualified name of a custom editor component (JComponent) to use for editing the properties of the step. This method can return null, in which case the system will dynamically generate an editor using the GenericObjectEditor
      Specified by:
      getCustomEditorForStep in interface Step
      Overrides:
      getCustomEditorForStep in class BaseStep
      Returns:
      the fully qualified name of a step editor component