Class SubspaceCluster

All Implemented Interfaces:
Serializable, OptionHandler, Randomizable, RevisionHandler

public class SubspaceCluster extends ClusterGenerator
A data generator that produces data points in hyperrectangular subspace clusters.

Valid options are:

 -h
  Prints this help.
 
 -o <file>
  The name of the output file, otherwise the generated data is
  printed to stdout.
 
 -r <name>
  The name of the relation.
 
 -d
  Whether to print debug informations.
 
 -S
  The seed for random function (default 1)
 
 -a <num>
  The number of attributes (default 1).
 
 -c
  Class Flag, if set, the cluster is listed in extra attribute.
 
 -b <range>
  The indices for boolean attributes.
 
 -m <range>
  The indices for nominal attributes.
 
 -C <cluster-definition>
  A cluster definition of class 'SubspaceClusterDefinition'
  (definition needs to be quoted to be recognized as 
  a single argument).
 
 Options specific to weka.datagenerators.clusterers.SubspaceClusterDefinition:
 
 -A <range>
  Uses a random uniform distribution for the instances in the cluster.
 
 -U <range>
  Generates totally uniformly distributed instances in the cluster.
 
 -G <range>
  Uses a Gaussian distribution for the instances in the cluster.
 
 -D <num>,<num>
  The attribute min/max (-A and -U) or mean/stddev (-G) for
  the cluster.
 
 -N <num>..<num>
  The range of number of instances per cluster (default 1..50).
 
 -I
  Uses integer instead of continuous values (default continuous).
 
Version:
$Revision: 15747 $
Author:
Gabi Schmidberger (gabi@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)
See Also:
  • Field Details

    • UNIFORM_RANDOM

      public static final int UNIFORM_RANDOM
      cluster type: uniform/random
      See Also:
    • TOTAL_UNIFORM

      public static final int TOTAL_UNIFORM
      cluster type: total uniform
      See Also:
    • GAUSSIAN

      public static final int GAUSSIAN
      cluster type: gaussian
      See Also:
    • TAGS_CLUSTERTYPE

      public static final Tag[] TAGS_CLUSTERTYPE
      the tags for the cluster types
    • CONTINUOUS

      public static final int CONTINUOUS
      cluster subtype: continuous
      See Also:
    • INTEGER

      public static final int INTEGER
      cluster subtype: integer
      See Also:
    • TAGS_CLUSTERSUBTYPE

      public static final Tag[] TAGS_CLUSTERSUBTYPE
      the tags for the cluster types
  • Constructor Details

    • SubspaceCluster

      public SubspaceCluster()
      initializes the generator, sets the number of clusters to 0, since user has to specify them explicitly
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this data generator.
      Returns:
      a description of the data generator suitable for displaying in the explorer/experimenter gui
    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class ClusterGenerator
      Returns:
      an enumeration of all the available options
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a list of options for this object.

      Valid options are:

       -h
        Prints this help.
       
       -o <file>
        The name of the output file, otherwise the generated data is
        printed to stdout.
       
       -r <name>
        The name of the relation.
       
       -d
        Whether to print debug informations.
       
       -S
        The seed for random function (default 1)
       
       -a <num>
        The number of attributes (default 1).
       
       -c
        Class Flag, if set, the cluster is listed in extra attribute.
       
       -b <range>
        The indices for boolean attributes.
       
       -m <range>
        The indices for nominal attributes.
       
       -C <cluster-definition>
        A cluster definition of class 'SubspaceClusterDefinition'
        (definition needs to be quoted to be recognized as 
        a single argument).
       
       Options specific to weka.datagenerators.clusterers.SubspaceClusterDefinition:
       
       -A <range>
        Uses a random uniform distribution for the instances in the cluster.
       
       -U <range>
        Generates totally uniformly distributed instances in the cluster.
       
       -G <range>
        Uses a Gaussian distribution for the instances in the cluster.
       
       -D <num>,<num>
        The attribute min/max (-A and -U) or mean/stddev (-G) for
        the cluster.
       
       -N <num>..<num>
        The range of number of instances per cluster (default 1..50).
       
       -I
        Uses integer instead of continuous values (default continuous).
       
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class ClusterGenerator
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • setBooleanIndices

      public void setBooleanIndices(String rangeList)
      Sets which attributes are boolean
      Parameters:
      rangeList - a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
      eg: first-3,5,6-last
      Throws:
      IllegalArgumentException - if an invalid range list is supplied
    • setBooleanCols

      public void setBooleanCols(Range value)
      Sets which attributes are boolean.
      Parameters:
      value - the range to use
    • getBooleanCols

      public Range getBooleanCols()
      returns the range of boolean attributes.
      Returns:
      the range of boolean attributes
    • booleanColsTipText

      public String booleanColsTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setNominalIndices

      public void setNominalIndices(String rangeList)
      Sets which attributes are nominal
      Parameters:
      rangeList - a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1.
      eg: first-3,5,6-last
      Throws:
      IllegalArgumentException - if an invalid range list is supplied
    • setNominalCols

      public void setNominalCols(Range value)
      Sets which attributes are nominal.
      Parameters:
      value - the range to use
    • getNominalCols

      public Range getNominalCols()
      returns the range of nominal attributes
      Returns:
      the range of nominal attributes
    • nominalColsTipText

      public String nominalColsTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getOptions

      public String[] getOptions()
      Gets the current settings of the datagenerator.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class ClusterGenerator
      Returns:
      an array of strings suitable for passing to setOptions
      See Also:
      • DataGenerator.removeBlacklist(String[])
    • getClusterDefinitions

      public ClusterDefinition[] getClusterDefinitions()
      returns the currently set clusters
      Returns:
      the currently set clusters
    • setClusterDefinitions

      public void setClusterDefinitions(ClusterDefinition[] value) throws Exception
      sets the clusters to use
      Parameters:
      value - the clusters do use
      Throws:
      Exception - if clusters are not the correct class
    • clusterDefinitionsTipText

      public String clusterDefinitionsTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getSingleModeFlag

      public boolean getSingleModeFlag()
      Gets the single mode flag.
      Specified by:
      getSingleModeFlag in class DataGenerator
      Returns:
      true if methode generateExample can be used.
    • defineDataFormat

      public Instances defineDataFormat() throws Exception
      Initializes the format for the dataset produced.
      Overrides:
      defineDataFormat in class DataGenerator
      Returns:
      the output data format
      Throws:
      Exception - data format could not be defined
      See Also:
      • DataGenerator.defaultRelationName()
    • isBoolean

      public boolean isBoolean(int index)
      Returns true if attribute is boolean
      Parameters:
      index - of the attribute
      Returns:
      true if the attribute is boolean
    • isNominal

      public boolean isNominal(int index)
      Returns true if attribute is nominal
      Parameters:
      index - of the attribute
      Returns:
      true if the attribute is nominal
    • getNumValues

      public int[] getNumValues()
      returns array that stores the number of values for a nominal attribute.
      Returns:
      the array that stores the number of values for a nominal attribute
    • generateExample

      public Instance generateExample() throws Exception
      Generate an example of the dataset.
      Specified by:
      generateExample in class DataGenerator
      Returns:
      the instance generated
      Throws:
      Exception - if format not defined or generating
      examples one by one is not possible, because voting is chosen
    • generateExamples

      public Instances generateExamples() throws Exception
      Generate all examples of the dataset.
      Specified by:
      generateExamples in class DataGenerator
      Returns:
      the instance generated
      Throws:
      Exception - if format not defined
    • generateFinished

      public String generateFinished() throws Exception
      Compiles documentation about the data generation after the generation process
      Specified by:
      generateFinished in class DataGenerator
      Returns:
      string with additional information about generated dataset
      Throws:
      Exception - no input structure has been defined
    • generateStart

      public String generateStart()
      Compiles documentation about the data generation before the generation process
      Specified by:
      generateStart in class DataGenerator
      Returns:
      string with additional information
    • getRevision

      public String getRevision()
      Returns the revision string.
      Returns:
      the revision
    • main

      public static void main(String[] args)
      Main method for testing this class.
      Parameters:
      args - should contain arguments for the data producer: