, Formerly called RapidMiner YALE (Yet Another Learning Environment ) is an environment for machine learning and data mining. Experiments can be produced from a large number of virtually any nestable operators. The experimental setup is described by XML and developed by means of a graphical user interface. RapidMiner covers both research as well as industrial and commercial applications.

YALE was developed by the Department of Artificial Intelligence at the Technical University of Dortmund since 2001. Since 2004 YALE is hosted at SourceForge. In May 2007, Yale was renamed RapidMiner; In July 2007, the version 4.0 and version 4.3 in November 2008, appeared. RapidMiner is available as open source software under the AGPL, or as a proprietary version.

RapidMiner includes more than 500 operators for all tasks of the knowledge discovery in databases, ie Operators for input and output, data pre-processing, machine learning, data mining, text mining, web mining, automatic sentiment analysis of Internet discussion forums ( Sentiment Analysis, Opinion Mining ), time series analysis and forecasting. In addition, more than 20 procedures are available to visualize high -dimensional data and models. RapidMiner is written in Java and can therefore be used on all major operating systems. All learning methods and Attributeevaluatoren of WEKA were also integrated.


Some features of RapidMiner are:

  • Processes of knowledge discovery can be modeled as operator trees
  • Internal XML representation ensures a standard for the exchange of data mining experiments
  • Scripting language allows automated large-scale data mining
  • Multi -layered data view concept ensures efficient and transparent data management
  • Graphical user interface and command line tool. A Java API allows the use of RapidMiner from your own Java programs
  • Plugin and extension mechanism, there are already some plugins, eg cluster analysis
  • Large number of high-dimensional visualizations of data and models
  • Applications include text mining, multimedia mining, feature engineering, data stream mining and learning variable concepts, development of ensemble methods and distributed data mining.