Class FPGrowth

All Implemented Interfaces:
Serializable, Cloneable, AssociationRulesProducer, Associator, CapabilitiesHandler, CapabilitiesIgnorer, CommandlineRunnable, OptionHandler, RevisionHandler, TechnicalInformationHandler

Class implementing the FP-growth algorithm for finding large item sets without candidate generation. Iteratively reduces the minimum support until it finds the required number of rules with the given minimum metric. For more information see:

J. Han, J.Pei, Y. Yin: Mining frequent patterns without candidate generation. In: Proceedings of the 2000 ACM-SIGMID International Conference on Management of Data, 1-12, 2000.

BibTeX:

 @inproceedings{Han2000,
    author = {J. Han and J.Pei and Y. Yin},
    booktitle = {Proceedings of the 2000 ACM-SIGMID International Conference on Management of Data},
    pages = {1-12},
    title = {Mining frequent patterns without candidate generation},
    year = {2000}
 }
 

Valid options are:

 -P <attribute index of positive value>
  Set the index of the attribute value to consider as 'positive'
  for binary attributes in normal dense instances. Index 2 is always
  used for sparse instances. (default = 2)
 
 -I <max items>
  The maximum number of items to include in large items sets (and rules). (default = -1, i.e. no limit.)
 
 -N <require number of rules>
  The required number of rules. (default = 10)
 
 -T <0=confidence | 1=lift | 2=leverage | 3=Conviction>
  The metric by which to rank rules. (default = confidence)
 
 -C <minimum metric score of a rule>
  The minimum metric score of a rule. (default = 0.9)
 
 -U <upper bound for minimum support>
  Upper bound for minimum support. (default = 1.0)
 
 -M <lower bound for minimum support>
  The lower bound for the minimum support. (default = 0.1)
 
 -D <delta for minimum support>
  The delta by which the minimum support is decreased in
  each iteration. (default = 0.05)
 
 -S
  Find all rules that meet the lower bound on
  minimum support and the minimum metric constraint.
  Turning this mode on will disable the iterative support reduction
  procedure to find the specified number of rules.
 
 -transactions <comma separated list of attribute names>
  Only consider transactions that contain these items (default = no restriction)
 
 -rules <comma separated list of attribute names>
  Only print rules that contain these items. (default = no restriction)
 
 -use-or
  Use OR instead of AND for must contain list(s). Use in conjunction
  with -transactions and/or -rules
 
Version:
$Revision: 15519 $
Author:
Mark Hall (mhall{[at]}pentaho{[dot]}com)
See Also:
  • Constructor Details

    • FPGrowth

      public FPGrowth()
      Construct a new FPGrowth object.
  • Method Details

    • generateRulesBruteForce

      public static List<AssociationRule> generateRulesBruteForce(weka.associations.FPGrowth.FrequentItemSets largeItemSets, DefaultAssociationRule.METRIC_TYPE metricToUse, double metricThreshold, int upperBoundMinSuppAsInstances, int lowerBoundMinSuppAsInstances, int totalTransactions)
      Generate all association rules, from the supplied frequet item sets, that meet a given minimum metric threshold. Uses a brute force approach.
      Parameters:
      largeItemSets - the set of frequent item sets
      metricToUse - the metric to use
      metricThreshold - the threshold value that a rule must meet
      upperBoundMinSuppAsInstances - the upper bound on the support in order to accept the rule
      lowerBoundMinSuppAsInstances - the lower bound on the support in order to accept the rule
      totalTransactions - the total number of transactions in the data
      Returns:
      a list of association rules
    • pruneRules

      public static List<AssociationRule> pruneRules(List<AssociationRule> rulesToPrune, ArrayList<Item> itemsToConsider, boolean useOr)
    • getCapabilities

      public Capabilities getCapabilities()
      Returns default capabilities of the classifier.
      Specified by:
      getCapabilities in interface Associator
      Specified by:
      getCapabilities in interface CapabilitiesHandler
      Overrides:
      getCapabilities in class AbstractAssociator
      Returns:
      the capabilities of this classifier
      See Also:
    • globalInfo

      public String globalInfo()
      Returns a string describing this associator
      Returns:
      a description of the evaluator suitable for displaying in the explorer/experimenter gui
    • getTechnicalInformation

      public TechnicalInformation getTechnicalInformation()
      Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
      Specified by:
      getTechnicalInformation in interface TechnicalInformationHandler
      Returns:
      the technical information about this class
    • resetOptions

      public void resetOptions()
      Reset all options to their default values.
    • positiveIndexTipText

      public String positiveIndexTipText()
      Tip text for this property suitable for displaying in the GUI.
      Returns:
      the tip text for this property.
    • setPositiveIndex

      public void setPositiveIndex(int index)
      Set the index of the attribute value to consider as positive for binary attributes in normal dense instances. Index 1 is always used for sparse instances.
      Parameters:
      index - the index to use for positive values in binary attributes.
    • getPositiveIndex

      public int getPositiveIndex()
      Get the index of the attribute value to consider as positive for binary attributes in normal dense instances. Index 1 is always used for sparse instances.
      Returns:
      the index to use for positive values in binary attributes.
    • setNumRulesToFind

      public void setNumRulesToFind(int numR)
      Set the desired number of rules to find.
      Parameters:
      numR - the number of rules to find.
    • getNumRulesToFind

      public int getNumRulesToFind()
      Get the number of rules to find.
      Returns:
      the number of rules to find.
    • numRulesToFindTipText

      public String numRulesToFindTipText()
      Tip text for this property suitable for displaying in the GUI.
      Returns:
      the tip text for this property.
    • setMetricType

      public void setMetricType(SelectedTag d)
      Set the metric type to use.
      Parameters:
      d - the metric type
    • setMaxNumberOfItems

      public void setMaxNumberOfItems(int max)
      Set the maximum number of items to include in large items sets.
      Parameters:
      max - the maxim number of items to include in large item sets.
    • getMaxNumberOfItems

      public int getMaxNumberOfItems()
      Gets the maximum number of items to be included in large item sets.
      Returns:
      the maximum number of items to be included in large items sets.
    • maxNumberOfItemsTipText

      public String maxNumberOfItemsTipText()
      Tip text for this property suitable for displaying in the GUI.
      Returns:
      the tip text for this property.
    • getMetricType

      public SelectedTag getMetricType()
      Get the metric type to use.
      Returns:
      the metric type to use.
    • metricTypeTipText

      public String metricTypeTipText()
      Tip text for this property suitable for displaying in the GUI.
      Returns:
      the tip text for this property.
    • minMetricTipText

      public String minMetricTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getMinMetric

      public double getMinMetric()
      Get the value of minConfidence.
      Returns:
      Value of minConfidence.
    • setMinMetric

      public void setMinMetric(double v)
      Set the value of minConfidence.
      Parameters:
      v - Value to assign to minConfidence.
    • transactionsMustContainTipText

      public String transactionsMustContainTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setTransactionsMustContain

      public void setTransactionsMustContain(String list)
      Set the comma separated list of items that transactions must contain in order to be considered for large item sets and rules.
      Parameters:
      list - a comma separated list of items (empty string indicates no restriction on the transactions).
    • getTransactionsMustContain

      public String getTransactionsMustContain()
      Gets the comma separated list of items that transactions must contain in order to be considered for large item sets and rules.
      Returns:
      return the comma separated list of items that transactions must contain.
    • rulesMustContainTipText

      public String rulesMustContainTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setRulesMustContain

      public void setRulesMustContain(String list)
      Set the comma separated list of items that rules must contain in order to be output.
      Parameters:
      list - a comma separated list of items (empty string indicates no restriction on the rules).
    • getRulesMustContain

      public String getRulesMustContain()
      Get the comma separated list of items that rules must contain in order to be output.
      Returns:
      the comma separated list of items that rules must contain in order to be output.
    • useORForMustContainListTipText

      public String useORForMustContainListTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setUseORForMustContainList

      public void setUseORForMustContainList(boolean b)
      Set whether to use OR rather than AND when considering must contain lists.
      Parameters:
      b - true if OR should be used instead of AND when considering transaction and rules must contain lists.
    • getUseORForMustContainList

      public boolean getUseORForMustContainList()
      Gets whether OR is to be used rather than AND when considering must contain lists.
      Returns:
      true if OR is used instead of AND.
    • deltaTipText

      public String deltaTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying, in the explorer/experimenter gui
    • getDelta

      public double getDelta()
      Get the value of delta.
      Returns:
      Value of delta.
    • setDelta

      public void setDelta(double v)
      Set the value of delta.
      Parameters:
      v - Value to assign to delta.
    • lowerBoundMinSupportTipText

      public String lowerBoundMinSupportTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getLowerBoundMinSupport

      public double getLowerBoundMinSupport()
      Get the value of lowerBoundMinSupport.
      Returns:
      Value of lowerBoundMinSupport.
    • setLowerBoundMinSupport

      public void setLowerBoundMinSupport(double v)
      Set the value of lowerBoundMinSupport.
      Parameters:
      v - Value to assign to lowerBoundMinSupport.
    • upperBoundMinSupportTipText

      public String upperBoundMinSupportTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • getUpperBoundMinSupport

      public double getUpperBoundMinSupport()
      Get the value of upperBoundMinSupport.
      Returns:
      Value of upperBoundMinSupport.
    • setUpperBoundMinSupport

      public void setUpperBoundMinSupport(double v)
      Set the value of upperBoundMinSupport.
      Parameters:
      v - Value to assign to upperBoundMinSupport.
    • findAllRulesForSupportLevelTipText

      public String findAllRulesForSupportLevelTipText()
      Tip text for this property suitable for displaying in the GUI.
      Returns:
      the tip text for this property.
    • setFindAllRulesForSupportLevel

      public void setFindAllRulesForSupportLevel(boolean s)
      If true then turn off the iterative support reduction method of finding x rules that meet the minimum support and metric thresholds and just return all the rules that meet the lower bound on minimum support and the minimum metric.
      Parameters:
      s - true if all rules meeting the lower bound on the support and minimum metric thresholds are to be found.
    • getFindAllRulesForSupportLevel

      public boolean getFindAllRulesForSupportLevel()
      Get whether all rules meeting the lower bound on min support and the minimum metric threshold are to be found.
      Returns:
      true if all rules meeting the lower bound on min support and the min metric threshold are to be found.
    • setOffDiskReportingFrequency

      public void setOffDiskReportingFrequency(int freq)
      Set how often to report some progress when the data is being read incrementally off of the disk rather than loaded into memory.
      Parameters:
      freq - the frequency to print progress.
    • getAssociationRules

      public AssociationRules getAssociationRules()
      Gets the list of mined association rules.
      Specified by:
      getAssociationRules in interface AssociationRulesProducer
      Returns:
      the list of association rules discovered during mining. Returns null if mining hasn't been performed yet.
    • getRuleMetricNames

      public String[] getRuleMetricNames()
      Gets a list of the names of the metrics output for each rule. This list should be the same (in terms of the names and order thereof) as that produced by AssociationRule.getMetricNamesForRule().
      Specified by:
      getRuleMetricNames in interface AssociationRulesProducer
      Returns:
      an array of the names of the metrics available for each rule learned by this producer.
    • canProduceRules

      public boolean canProduceRules()
      Returns true if this AssociationRulesProducer can actually produce rules. Most implementing classes will always return true from this method (obviously :-)). However, an implementing class that actually acts as a wrapper around things that may or may not implement AssociationRulesProducer will want to return false if the thing they wrap can't produce rules.
      Specified by:
      canProduceRules in interface AssociationRulesProducer
      Returns:
      true if this producer can produce rules in its current configuration
    • listOptions

      public Enumeration<Option> listOptions()
      Returns an enumeration describing the available options.
      Specified by:
      listOptions in interface OptionHandler
      Overrides:
      listOptions in class AbstractAssociator
      Returns:
      an enumeration of all the available options.
    • setOptions

      public void setOptions(String[] options) throws Exception
      Parses a given list of options.

      Valid options are:

       -P <attribute index of positive value>
        Set the index of the attribute value to consider as 'positive'
        for binary attributes in normal dense instances. Index 2 is always
        used for sparse instances. (default = 2)
       
       -I <max items>
        The maximum number of items to include in large items sets (and rules). (default = -1, i.e. no limit.)
       
       -N <require number of rules>
        The required number of rules. (default = 10)
       
       -T <0=confidence | 1=lift | 2=leverage | 3=Conviction>
        The metric by which to rank rules. (default = confidence)
       
       -C <minimum metric score of a rule>
        The minimum metric score of a rule. (default = 0.9)
       
       -U <upper bound for minimum support>
        Upper bound for minimum support. (default = 1.0)
       
       -M <lower bound for minimum support>
        The lower bound for the minimum support. (default = 0.1)
       
       -D <delta for minimum support>
        The delta by which the minimum support is decreased in
        each iteration. (default = 0.05)
       
       -S
        Find all rules that meet the lower bound on
        minimum support and the minimum metric constraint.
        Turning this mode on will disable the iterative support reduction
        procedure to find the specified number of rules.
       
       -transactions <comma separated list of attribute names>
        Only consider transactions that contain these items (default = no restriction)
       
       -rules <comma separated list of attribute names>
        Only print rules that contain these items. (default = no restriction)
       
       -use-or
        Use OR instead of AND for must contain list(s). Use in conjunction
        with -transactions and/or -rules
       
      Specified by:
      setOptions in interface OptionHandler
      Overrides:
      setOptions in class AbstractAssociator
      Parameters:
      options - the list of options as an array of strings
      Throws:
      Exception - if an option is not supported
    • getOptions

      public String[] getOptions()
      Gets the current settings of the classifier.
      Specified by:
      getOptions in interface OptionHandler
      Overrides:
      getOptions in class AbstractAssociator
      Returns:
      an array of strings suitable for passing to setOptions
    • buildAssociations

      public void buildAssociations(Instances data) throws Exception
      Method that generates all large item sets with a minimum support, and from these all association rules with a minimum metric (i.e. confidence, lift etc.).
      Specified by:
      buildAssociations in interface Associator
      Parameters:
      data - the instances to be used for generating the associations
      Throws:
      Exception - if rules can't be built successfully
    • toString

      public String toString()
      Output the association rules.
      Overrides:
      toString in class Object
      Returns:
      a string representation of the model.
    • graph

      public String graph(weka.associations.FPGrowth.FPTreeRoot tree)
      Assemble a dot graph representation of the FP-tree.
      Parameters:
      tree - the root of the FP-tree
      Returns:
      a graph representation as a String in dot format.
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Overrides:
      getRevision in class AbstractAssociator
      Returns:
      the revision
    • main

      public static void main(String[] args)
      Main method.
      Parameters:
      args - the commandline options