ELKI

Environment for Developing KDD - Applications Supported by Index -Structures ( ELKI ) in German about " environment for the development of knowledge discovery applications using index structure support " is a research project of the databases Chair of Professor Hans -Peter Kriegel at the Ludwig -Maximilians University of Munich.

It is a Java -written, modular software package ( " Framework " ) for Knowledge Discovery in Databases. The focus is on methods for cluster analysis, outlier detection, and the use of index structures in such proceedings. As a research project of a university, the focus is on easy extensibility, readability and the use in research and teaching at the university, not at maximum speed or in integration with existing business intelligence applications. So far, for example, has none of the released versions of a database interface to existing industrial database systems, and use of the software requires a prior knowledge and reading the documentation ahead. The target group for the project are researchers, students and software developers.

The modular architecture of the software allows for many combinations of algorithms contained, data types, distance measures and index structures. In the development of new methods or distances can therefore be easily combined with existing modules and evaluated the new module. While the visualization modules allow often, simply present the results and compare them that way. The development costs and development time of such modules is considerably simplified by reusing existing code, so that the software can easily be used as a basis for seminar, diploma and master theses.

Featured algorithms

Included in ELKI may include the following algorithms ( excerpt):

  • K -Means Clustering
  • EM algorithm - Expectation-Maximization Algorithm
  • Apriori algorithm
  • Single - linkage clustering
  • DBSCAN ( Density -Based Spatial Clustering of Applications with Noise )
  • OPTICS ( Ordering Points To Identify the Clustering Structure), including the further development OPTICS -OF, DeLiClu, HISC, HiCO and Dish
  • Local Outlier Factor ( LOF )
  • R- tree, R *-tree and M- tree

Revision history

Version 0.1 (July 2008) already contained numerous algorithms in the areas of cluster analysis and outlier detection, as well as several index structures like the R *-tree. The focus of the first release was on subspace clustering method.

Version 0.2 ( July 2009) added functions for time series analysis, in particular distance functions for this purpose.

Version 0.3 (March 2010) expanded the range of outlier detection algorithms and visualization modules.

Version 0.4 (August 2011) adds numerous methods added for detecting spatial outliers in spatial data.

Version 0.5 (April 2012), focuses on the evaluation of cluster analysis results, new visualizations and a few new algorithms.

Version 0.6 (June 2013 / January 2014 ) comes with an extension for 3D parallel coordinates and additional algorithms.

Awards

ELKI began as an implementation of the doctoral thesis of Dr. Arthur Zimek, who won the " SIGKDD Doctoral Dissertation Award 2009 Runner -up " of the Association for Computing Machinery for their contributions to " Correlation Clustering ". Published in the course of the thesis algorithms ( 4C, COPAC, HiCO, ERiC, CASH) along with a few precursors and comparison methods are available in ELKI.

The demonstration version 0.4 at the conference " Symposium on Spatial and Temporal Databases 2011" with the Geo - outlier extensions for ELKI won the " Best Demonstration Paper Award " of the conference.

Related Applications

  • WEKA A similar project at the University of Waikato, with a focus on classification algorithms.
  • RapidMiner a freely and commercially available application with a focus on machine learning.
  • KNIME (Konstanz Information Miner ) project at the University of Konstanz for interactive data analysis in Eclipse.
309792
de