Package weka.core

Class Stopwords

java.lang.Object
weka.core.Stopwords
All Implemented Interfaces:
RevisionHandler

public class Stopwords extends Object implements RevisionHandler
Class that can test whether a given string is a stop word. Lowercases all words before the test.

The format for reading and writing is one word per line, lines starting with '#' are interpreted as comments and therefore skipped.

The default stopwords are based on Rainbow.

Accepts the following parameter:

-i file
loads the stopwords from the given file

-o file
saves the stopwords to the given file

-p
outputs the current stopwords on stdout

Any additional parameters are interpreted as words to test as stopwords.

Version:
$Revision: 10203 $
Author:
Eibe Frank (eibe@cs.waikato.ac.nz), Ashraf M. Kibriya (amk14@cs.waikato.ac.nz), FracPete (fracpete at waikato dot ac dot nz)
  • Constructor Summary

    Constructors
    Constructor
    Description
    initializes the stopwords (based on Rainbow).
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    add(String word)
    adds the given word to the stopword list (is automatically converted to lower case and trimmed)
    void
    removes all stopwords
    Returns a sorted enumeration over all stored stopwords
    Returns the revision string.
    boolean
    is(String word)
    Returns true if the given string is a stop word.
    static boolean
    Returns true if the given string is a stop word.
    static void
    main(String[] args)
    Accepts the following parameter:
    void
    Generates a new Stopwords object from the reader.
    void
    read(File file)
    Generates a new Stopwords object from the given file
    void
    read(String filename)
    Generates a new Stopwords object from the given file
    boolean
    remove(String word)
    removes the word from the stopword list
    returns the current stopwords in a string
    void
    Writes the current stopwords to the given writer.
    void
    write(File file)
    Writes the current stopwords to the given file
    void
    write(String filename)
    Writes the current stopwords to the given file

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
  • Constructor Details

    • Stopwords

      public Stopwords()
      initializes the stopwords (based on Rainbow).
  • Method Details

    • clear

      public void clear()
      removes all stopwords
    • add

      public void add(String word)
      adds the given word to the stopword list (is automatically converted to lower case and trimmed)
      Parameters:
      word - the word to add
    • remove

      public boolean remove(String word)
      removes the word from the stopword list
      Parameters:
      word - the word to remove
      Returns:
      true if the word was found in the list and then removed
    • is

      public boolean is(String word)
      Returns true if the given string is a stop word.
      Parameters:
      word - the word to test
      Returns:
      true if the word is a stopword
    • elements

      public Enumeration<String> elements()
      Returns a sorted enumeration over all stored stopwords
      Returns:
      the enumeration over all stopwords
    • read

      public void read(String filename) throws Exception
      Generates a new Stopwords object from the given file
      Parameters:
      filename - the file to read the stopwords from
      Throws:
      Exception - if reading fails
    • read

      public void read(File file) throws Exception
      Generates a new Stopwords object from the given file
      Parameters:
      file - the file to read the stopwords from
      Throws:
      Exception - if reading fails
    • read

      public void read(BufferedReader reader) throws Exception
      Generates a new Stopwords object from the reader. The reader is closed automatically.
      Parameters:
      reader - the reader to get the stopwords from
      Throws:
      Exception - if reading fails
    • write

      public void write(String filename) throws Exception
      Writes the current stopwords to the given file
      Parameters:
      filename - the file to write the stopwords to
      Throws:
      Exception - if writing fails
    • write

      public void write(File file) throws Exception
      Writes the current stopwords to the given file
      Parameters:
      file - the file to write the stopwords to
      Throws:
      Exception - if writing fails
    • write

      public void write(BufferedWriter writer) throws Exception
      Writes the current stopwords to the given writer. The writer is closed automatically.
      Parameters:
      writer - the writer to get the stopwords from
      Throws:
      Exception - if writing fails
    • toString

      public String toString()
      returns the current stopwords in a string
      Overrides:
      toString in class Object
      Returns:
      the current stopwords
    • isStopword

      public static boolean isStopword(String str)
      Returns true if the given string is a stop word.
      Parameters:
      str - the word to test
      Returns:
      true if the word is a stopword
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Returns:
      the revision
    • main

      public static void main(String[] args) throws Exception
      Accepts the following parameter:

      -i file
      loads the stopwords from the given file

      -o file
      saves the stopwords to the given file

      -p
      outputs the current stopwords on stdout

      Any additional parameters are interpreted as words to test as stopwords.

      Parameters:
      args - commandline parameters
      Throws:
      Exception - if something goes wrong