Class ThresholdCurve

java.lang.Object
weka.classifiers.evaluation.ThresholdCurve
All Implemented Interfaces:
RevisionHandler

public class ThresholdCurve extends Object implements RevisionHandler
Generates points illustrating prediction tradeoffs that can be obtained by varying the threshold value between classes. For example, the typical threshold value of 0.5 means the predicted probability of "positive" must be higher than 0.5 for the instance to be predicted as "positive". The resulting dataset can be used to visualize precision/recall tradeoff, or for ROC curve analysis (true positive rate vs false positive rate). Weka just varies the threshold on the class probability estimates in each case. The Mann Whitney statistic is used to calculate the AUC.
Version:
$Revision: 15751 $
Author:
Len Trigg (len@reeltwo.com)
  • Field Details

  • Constructor Details

    • ThresholdCurve

      public ThresholdCurve()
  • Method Details

    • getCurve

      public Instances getCurve(ArrayList<Prediction> predictions)
      Calculates the performance stats for the default class and return results as a set of Instances. The structure of these Instances is as follows:

      • True Positives
      • False Negatives
      • False Positives
      • True Negatives
      • False Positive Rate
      • True Positive Rate
      • Precision
      • Recall
      • Fallout
      • Threshold contains the probability threshold that gives rise to the previous performance values.

      For the definitions of these measures, see TwoClassStats

      Parameters:
      predictions - the predictions to base the curve on
      Returns:
      datapoints as a set of instances, null if no predictions have been made.
      See Also:
    • getCurve

      public Instances getCurve(ArrayList<Prediction> predictions, int classIndex)
      Calculates the performance stats for the desired class and return results as a set of Instances.
      Parameters:
      predictions - the predictions to base the curve on
      classIndex - index of the class of interest.
      Returns:
      datapoints as a set of instances.
    • getNPointPrecision

      public static double getNPointPrecision(Instances tcurve, int n)
      Calculates the n point precision result, which is the precision averaged over n evenly spaced (w.r.t recall) samples of the curve.
      Parameters:
      tcurve - a previously extracted threshold curve Instances.
      n - the number of points to average over.
      Returns:
      the n-point precision.
    • getPRCArea

      public static double getPRCArea(Instances tcurve)
      Calculates the area under the precision-recall curve (AUPRC).
      Parameters:
      tcurve - a previously extracted threshold curve Instances.
      Returns:
      the PRC area, or Double.NaN if you don't pass in a ThresholdCurve generated Instances.
    • getROCArea

      public static double getROCArea(Instances tcurve)
      Calculates the area under the ROC curve as the Wilcoxon-Mann-Whitney statistic.
      Parameters:
      tcurve - a previously extracted threshold curve Instances.
      Returns:
      the ROC area, or Double.NaN if you don't pass in a ThresholdCurve generated Instances.
    • getThresholdInstance

      public static int getThresholdInstance(Instances tcurve, double threshold)
      Gets the index of the instance with the closest threshold value to the desired target
      Parameters:
      tcurve - a set of instances that have been generated by this class
      threshold - the target threshold
      Returns:
      the index of the instance that has threshold closest to the target, or -1 if this could not be found (i.e. no data, or bad threshold target)
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Returns:
      the revision