Class TrainTestSplitMaker

java.lang.Object
weka.knowledgeflow.steps.BaseStep
weka.knowledgeflow.steps.TrainTestSplitMaker
All Implemented Interfaces:
Serializable, BaseStepExtender, Step

@KFStep(name="TrainTestSplitMaker", category="Evaluation", toolTipText="A step that randomly splits incoming data into a training and test set", iconPath="weka/gui/knowledgeflow/icons/TrainTestSplitMaker.gif") public class TrainTestSplitMaker extends BaseStep
A step that creates a random train/test split from an incoming data set.
Version:
$Revision: $
Author:
Mark Hall (mhall{[at]}pentaho{[dot]}com)
See Also:
  • Constructor Details

    • TrainTestSplitMaker

      public TrainTestSplitMaker()
  • Method Details

    • setTrainPercent

      @OptionMetadata(displayName="Training percentage", description="The percentage of data to go into the training set", displayOrder=1) public void setTrainPercent(String percent)
      Set the training percentage
      Parameters:
      percent - the training percentage
    • getTrainPercent

      public String getTrainPercent()
      Get the training percentage
      Returns:
      the training percentage
    • setSeed

      @OptionMetadata(displayName="Random seed", description="The random seed to use when shuffling the data", displayOrder=2) public void setSeed(String seed)
      Set the random seed to use
      Parameters:
      seed - the random seed to use
    • getSeed

      public String getSeed()
      Get the random seed to use
      Returns:
      the random seed to use
    • setPreserveOrder

      @OptionMetadata(displayName="Preserve instance order", description="Preserve the order of the instances rather than randomly shuffling", displayOrder=3) public void setPreserveOrder(boolean preserve)
      Set whether to preserve the order of the instances or not
      Parameters:
      preserve - true to preserve the order rather than randomly shuffling first
    • getPreserveOrder

      public boolean getPreserveOrder()
      Get whether to preserve the order of the instances or not
      Returns:
      true to preserve the order rather than randomly shuffling first
    • stepInit

      public void stepInit() throws WekaException
      Initialize the step
      Throws:
      WekaException - if a problem occurs
    • processIncoming

      public void processIncoming(Data data) throws WekaException
      Process an incoming data payload (if the step accepts incoming connections)
      Specified by:
      processIncoming in interface BaseStepExtender
      Specified by:
      processIncoming in interface Step
      Overrides:
      processIncoming in class BaseStep
      Parameters:
      data - the data to process
      Throws:
      WekaException - if a problem occurs
    • getIncomingConnectionTypes

      public List<String> getIncomingConnectionTypes()
      Get a list of incoming connection types that this step can accept. Ideally (and if appropriate), this should take into account the state of the step and any existing incoming connections. E.g. a step might be able to accept one (and only one) incoming batch data connection.
      Returns:
      a list of incoming connections that this step can accept given its current state
    • getOutgoingConnectionTypes

      public List<String> getOutgoingConnectionTypes()
      Get a list of outgoing connection types that this step can produce. Ideally (and if appropriate), this should take into account the state of the step and the incoming connections. E.g. depending on what incoming connection is present, a step might be able to produce a trainingSet output, a testSet output or neither, but not both.
      Returns:
      a list of outgoing connections that this step can produce
    • outputStructureForConnectionType

      public Instances outputStructureForConnectionType(String connectionName) throws WekaException
      If possible, get the output structure for the named connection type as a header-only set of instances. Can return null if the specified connection type is not representable as Instances or cannot be determined at present.
      Specified by:
      outputStructureForConnectionType in interface Step
      Overrides:
      outputStructureForConnectionType in class BaseStep
      Parameters:
      connectionName - the name of the connection type to get the output structure for
      Returns:
      the output structure as a header-only Instances object
      Throws:
      WekaException - if a problem occurs