Class Join

All Implemented Interfaces:
Serializable, BaseStepExtender, Step

@KFStep(name="Join", category="Flow", toolTipText="Performs an inner join on two incoming datasets/instance streams (IMPORTANT: assumes that both datasets are sorted in ascending order of the key fields). If data is not sorted then usea Sorter step to sort both into ascending order of the key fields. Does not handle the case wherekeys are not unique in one or both inputs.", iconPath="weka/gui/knowledgeflow/icons/Join.gif") public class Join extends BaseStep
Step that performs an inner join on one or more key fields from two incoming batch or streaming datasets.
Version:
$Revision: $
Author:
Mark Hall (mhall{[at]}pentaho{[dot]}com)
See Also:
  • Field Details

    • KEY_SPEC_SEPARATOR

      public static final String KEY_SPEC_SEPARATOR
      Separator used to separate first and second input key specifications
      See Also:
  • Constructor Details

    • Join

      public Join()
  • Method Details

    • setKeySpec

      public void setKeySpec(String ks)
      Set the key specification (in internal format - k11,k12,...,k1nKEY_SPEC_SEPARATORk21,k22,...,k2n)
      Parameters:
      ks - the keys specification
    • getKeySpec

      public String getKeySpec()
      Get the key specification (in internal format - k11,k12,...,k1nKEY_SPEC_SEPARATORk21,k22,...,k2n)
      Returns:
      the keys specification
    • getConnectedInputNames

      public List<String> getConnectedInputNames()
      Get the names of the connected steps as a list
      Returns:
      the names of the connected steps as a list
    • getFirstInputStructure

      public Instances getFirstInputStructure() throws WekaException
      Get the Instances structure being produced by the first input
      Returns:
      the Instances structure from the first input
      Throws:
      WekaException - if a problem occurs
    • getSecondInputStructure

      public Instances getSecondInputStructure() throws WekaException
      Get the Instances structure being produced by the second input
      Returns:
      the Instances structure from the second input
      Throws:
      WekaException - if a problem occurs
    • stepInit

      public void stepInit() throws WekaException
      Initialize the step
      Throws:
      WekaException - if a problem occurs
    • processIncoming

      public void processIncoming(Data data) throws WekaException
      Process some incoming data
      Specified by:
      processIncoming in interface BaseStepExtender
      Specified by:
      processIncoming in interface Step
      Overrides:
      processIncoming in class BaseStep
      Parameters:
      data - the data to process
      Throws:
      WekaException - if a problem occurs
    • getIncomingConnectionTypes

      public List<String> getIncomingConnectionTypes()
      Get a list of incoming connection types that this step can accept. Ideally (and if appropriate), this should take into account the state of the step and any existing incoming connections. E.g. a step might be able to accept one (and only one) incoming batch data connection.
      Returns:
      a list of incoming connections that this step can accept given its current state
    • getOutgoingConnectionTypes

      public List<String> getOutgoingConnectionTypes()
      Get a list of outgoing connection types that this step can produce. Ideally (and if appropriate), this should take into account the state of the step and the incoming connections. E.g. depending on what incoming connection is present, a step might be able to produce a trainingSet output, a testSet output or neither, but not both.
      Returns:
      a list of outgoing connections that this step can produce
    • getCustomEditorForStep

      public String getCustomEditorForStep()
      Return the fully qualified name of a custom editor component (JComponent) to use for editing the properties of the step. This method can return null, in which case the system will dynamically generate an editor using the GenericObjectEditor
      Specified by:
      getCustomEditorForStep in interface Step
      Overrides:
      getCustomEditorForStep in class BaseStep
      Returns:
      the fully qualified name of a step editor component