Class DatabaseLoader

java.lang.Object
weka.core.converters.AbstractLoader
weka.core.converters.DatabaseLoader
All Implemented Interfaces:
Serializable, BatchConverter, DatabaseConverter, IncrementalConverter, Loader, EnvironmentHandler, OptionHandler, RevisionHandler

Reads Instances from a Database. Can read a database in batch or incremental mode.
In inremental mode MySQL and HSQLDB are supported.
For all other DBMS set a pseudoincremental mode is used:
In pseudo incremental mode the instances are read into main memory all at once and then incrementally provided to the user.
For incremental loading the rows in the database table have to be ordered uniquely.
The reason for this is that every time only a single row is fetched by extending the user query by a LIMIT clause.
If this extension is impossible instances will be loaded pseudoincrementally. To ensure that every row is fetched exaclty once, they have to ordered.
Therefore a (primary) key is necessary.This approach is chosen, instead of using JDBC driver facilities, because the latter one differ betweeen different drivers.
If you use the DatabaseSaver and save instances by generating automatically a primary key (its name is defined in DtabaseUtils), this primary key will be used for ordering but will not be part of the output. The user defined SQL query to extract the instances should not contain LIMIT and ORDER BY clauses (see -Q option).
In addition, for incremental loading, you can define in the DatabaseUtils file how many distinct values a nominal attribute is allowed to have. If this number is exceeded, the column will become a string attribute.
In batch mode no string attributes will be created.

Valid options are:

 -url <JDBC URL>
  The JDBC URL to connect to.
  (default: from DatabaseUtils.props file)
 
 -user <name>
  The user to connect with to the database.
  (default: none)
 
 -password <password>
  The password to connect with to the database.
  (default: none)
 
 -Q <query>
  SQL query of the form
   SELECT <list of columns>|* FROM <table> [WHERE]
  to execute.
  (default: Select * From Results0)
 
 -P <list of column names>
  List of column names uniquely defining a DB row
  (separated by ', ').
  Used for incremental loading.
  If not specified, the key will be determined automatically,
  if possible with the used JDBC driver.
  The auto ID column created by the DatabaseSaver won't be loaded.
 
 -I
  Sets incremental loading
 
Version:
$Revision: 12418 $
Author:
Stefan Mutter (mutter@cs.waikato.ac.nz)
See Also:
  • Constructor Details

    • DatabaseLoader

      public DatabaseLoader() throws Exception
      Constructor
      Throws:
      Exception - if initialization fails
  • Method Details

    • globalInfo

      public String globalInfo()
      Returns a string describing this Loader
      Returns:
      a description of the Loader suitable for displaying in the explorer/experimenter gui
    • setEnvironment

      public void setEnvironment(Environment env)
      Set the environment variables to use.
      Specified by:
      setEnvironment in interface EnvironmentHandler
      Parameters:
      env - the environment variables to use
    • resetOptions

      public void resetOptions()
      Resets the Loader to the settings in either the default DatabaseUtils.props or any property file that the user has specified via setCustomPropsFile().
    • reset

      public void reset()
      Resets the Loader ready to read a new data set using set options
      Specified by:
      reset in interface Loader
      Overrides:
      reset in class AbstractLoader
      Throws:
      Exception - if an error occurs while disconnecting from the database
    • resetStructure

      public void resetStructure()
      Resets the structure of instances
    • setQuery

      public void setQuery(String q)
      Sets the query to execute against the database
      Parameters:
      q - the query to execute
    • getQuery

      @OptionMetadata(displayName="Query", description="The query to execute", displayOrder=4) public String getQuery()
      Gets the query to execute against the database
      Returns:
      the query
    • queryTipText

      public String queryTipText()
      the tip text for this property
      Returns:
      the tip text
    • setKeys

      public void setKeys(String keys)
      Sets the key columns of a database table
      Parameters:
      keys - a String containing the key columns in a comma separated list.
    • getKeys

      @OptionMetadata(displayName="Key columns", description="Specific key columns to use if a primary key cannot be automatically detected. Used in incremental loading.", displayOrder=5) public String getKeys()
      Gets the key columns' name
      Returns:
      name of the key columns'
    • keysTipText

      public String keysTipText()
      the tip text for this property
      Returns:
      the tip text
    • setCustomPropsFile

      public void setCustomPropsFile(File value)
      Sets the custom properties file to use.
      Parameters:
      value - the custom props file to load database parameters from, use null or directory to disable custom properties.
    • getCustomPropsFile

      @OptionMetadata(displayName="DB config file", description="The custom properties that the user can use to override the default ones.", displayOrder=8) @FilePropertyMetadata(fileChooserDialogType=0, directoriesOnly=false) public File getCustomPropsFile()
      Returns the custom properties file in use, if any.
      Returns:
      the custom props file, null if none used
    • customPropsFileTipText

      public String customPropsFileTipText()
      The tip text for this property.
      Returns:
      the tip text
    • setUrl

      public void setUrl(String url)
      Sets the database URL
      Specified by:
      setUrl in interface DatabaseConverter
      Parameters:
      url - string with the database URL
    • getUrl

      @OptionMetadata(displayName="Database URL", description="The URL of the database", displayOrder=1) public String getUrl()
      Gets the URL
      Specified by:
      getUrl in interface DatabaseConverter
      Returns:
      the URL
    • urlTipText

      public String urlTipText()
      the tip text for this property
      Returns:
      the tip text
    • setUser

      public void setUser(String user)
      Sets the database user
      Specified by:
      setUser in interface DatabaseConverter
      Parameters:
      user - the database user name
    • getUser

      @OptionMetadata(displayName="Username", description="The user name for the database", displayOrder=2) public String getUser()
      Gets the user name
      Specified by:
      getUser in interface DatabaseConverter
      Returns:
      name of database user
    • userTipText

      public String userTipText()
      the tip text for this property
      Returns:
      the tip text
    • setPassword

      public void setPassword(String password)
      Sets user password for the database
      Specified by:
      setPassword in interface DatabaseConverter
      Parameters:
      password - the password
    • getPassword

      @OptionMetadata(displayName="Password", description="The database password", displayOrder=3) @PasswordProperty public String getPassword()
      Returns the database password
      Returns:
      the database password
    • passwordTipText

      public String passwordTipText()
      the tip text for this property
      Returns:
      the tip text
    • sparseDataTipText

      public String sparseDataTipText()
      Returns the tip text for this property
      Returns:
      tip text for this property suitable for displaying in the explorer/experimenter gui
    • setSparseData

      public void setSparseData(boolean s)
      Sets whether data should be encoded as sparse instances
      Parameters:
      s - true if data should be encoded as a set of sparse instances
    • getSparseData

      @OptionMetadata(displayName="Create sparse instances", description="Return sparse rather than normal instances", displayOrder=6) public boolean getSparseData()
      Gets whether data is to be returned as a set of sparse instances
      Returns:
      true if data is to be encoded as sparse instances
    • setSource

      public void setSource(String url, String userName, String password)
      Sets the database url, user and pw
      Parameters:
      url - the database url
      userName - the user name
      password - the password
    • setSource

      public void setSource(String url)
      Sets the database url
      Parameters:
      url - the database url
    • setSource

      public void setSource() throws Exception
      Sets the database url using the DatabaseUtils file
      Throws:
      Exception - if something goes wrong
    • connectToDatabase

      public void connectToDatabase()
      Opens a connection to the database
    • getStructure

      public Instances getStructure() throws IOException
      Determines and returns (if possible) the structure (internally the header) of the data set as an empty set of instances.
      Specified by:
      getStructure in interface Loader
      Specified by:
      getStructure in class AbstractLoader
      Returns:
      the structure of the data set as an empty set of Instances
      Throws:
      IOException - if an error occurs
    • getDataSet

      public Instances getDataSet() throws IOException
      Return the full data set in batch mode (header and all intances at once).
      Specified by:
      getDataSet in interface Loader
      Specified by:
      getDataSet in class AbstractLoader
      Returns:
      the structure of the data set as an empty set of Instances
      Throws:
      IOException - if there is no source or parsing fails
    • getNextInstance

      public Instance getNextInstance(Instances structure) throws IOException
      Read the data set incrementally---get the next instance in the data set or returns null if there are no more instances to get. If the structure hasn't yet been determined by a call to getStructure then method does so before returning the next instance in the data set.
      Specified by:
      getNextInstance in interface Loader
      Specified by:
      getNextInstance in class AbstractLoader
      Parameters:
      structure - the dataset header information, will get updated in case of string or relational attributes
      Returns:
      the next instance in the data set as an Instance object or null if there are no more instances to be read
      Throws:
      IOException - if there is an error during parsing
    • getOptions

      public String[] getOptions()
      Gets the setting
      Specified by:
      getOptions in interface OptionHandler
      Returns:
      the current setting
    • listOptions

      public Enumeration<Option> listOptions()
      Lists the available options
      Specified by:
      listOptions in interface OptionHandler
      Returns:
      an enumeration of the available options
    • setOptions

      public void setOptions(String[] options) throws Exception
      Sets the options. Valid options are:

       -url <JDBC URL>
        The JDBC URL to connect to.
        (default: from DatabaseUtils.props file)
       
       -user <name>
        The user to connect with to the database.
        (default: none)
       
       -password <password>
        The password to connect with to the database.
        (default: none)
       
       -Q <query>
        SQL query of the form
         SELECT <list of columns>|* FROM <table> [WHERE]
        to execute.
        (default: Select * From Results0)
       
       -P <list of column names>
        List of column names uniquely defining a DB row
        (separated by ', ').
        Used for incremental loading.
        If not specified, the key will be determined automatically,
        if possible with the used JDBC driver.
        The auto ID column created by the DatabaseSaver won't be loaded.
       
       -I
        Sets incremental loading
       
      Specified by:
      setOptions in interface OptionHandler
      Parameters:
      options - the options
      Throws:
      Exception - if options cannot be set
    • getRevision

      public String getRevision()
      Returns the revision string.
      Specified by:
      getRevision in interface RevisionHandler
      Returns:
      the revision
    • main

      public static void main(String[] options)
      Main method.
      Parameters:
      options - the options