KNIME, the Konstanz Information Miner, is a free software for interactive data analysis. KNIME possible by the modular pipelining concept, the integration of many machine learning and data mining methods. The graphical user interface allows for quick and easy stringing set of modules for data preprocessing (ETL: Extraction, Transformation, Loading), the modeling and analysis and visualization. KNIME is since around 2006 in the area of ​​pharmaceutical research in action. KNIME is also used in other areas such as customer relationship management (CRM ), business intelligence and financial data analysis.


The development of KNIME began in the spring of 2004 - a group of software developers from Silicon Valley began under the direction of Michael Berthold, University of Konstanz with the design of the platform. The focus of development was from the outset on a professional software architecture, which had to be modular and highly scalable. Mid-2006 was released the first public version, which quickly resulted in the pharmaceutical sector in particular that many commercial software vendors also integrate their tools in KNIME. After the publication of an article in the c't in 2006 KNIME is used increasingly in other areas. Since June 2008, a Zurich-based company ( GmbH) also allows for the provision of expert technical support and consulting services to the KNIME platform. KNIME cuts in comparisons of open source data mining systems well above average and stands out in particular through its ease of use.


KNIME is available under GPL since version 2.1.


KNIME is developed in Java and deployed as an Eclipse plugin. Other modules can be easily integrated as an additional plug-ins. KNIMEs core version contains several hundred modules for data integration (File I / O, database operators with support for all major databases ), data transformations ( filters, converters, combiners ) and the most common methods of data analysis and visualization. More properties from KNIME:

  • KNIMEs core architecture allows the processing of large amounts of data that are limited only by the available disk space (most other open source data analysis projects working memory- based and thereby limit the amount of data erarbeitbaren considerably ). Examples are the analysis of 300 million address data, 20 million and 10 million cell images molecular structures.
  • Additional plugins allow the integration of methods for text mining and image mining and time series analysis.
  • Integrations for numerous other open source methods exist, including the method of WEKA, the statistical R- project and LibSVM, JFreeChart and ImageJ.