Machine learning

Machine learning is a generic term for the "artificial" creation of knowledge from experience: An artificial system learns from examples and may, after completion of the learning phase generalize. That is, it learns not simply memorized the examples, but it "recognizes" regularities in the training data. Thus, the system can assess and unknown data.

Were selected from the wide range of possible applications mentioned here automated diagnostic procedures that detect credit card fraud, stock market analysis, classifying DNA sequences, speech and handwriting recognition and autonomous systems.

The subject is closely related to " Knowledge Discovery in Databases " and " data mining ", which is, however, mostly centered on finding new patterns and regularities. Many algorithms can be used for both objectives, and in particular, " Knowledge Discovery in Databases " are used to produce or pre-process learning data for " machine learning ", and algorithms from machine learning, see the data mining application.

Symbolic and subsymbolic systems

In machine learning type and thickness of the knowledge representation plays an important role. A distinction between symbolic systems, in which the knowledge - both the examples and the induced rules - is explicitly represented, and sub-symbolic systems such as neural networks, which, although a predictable behavior " trained " is, but do not allow insight into the learned solutions; Here knowledge is implicitly represented.

In the symbolic approaches propositional and predicate logic systems can be distinguished. Representatives of the former are ID3 and its successor C4.5. The latter are developed in the field of inductive logic programming.

Algorithmic approaches

The practical implementation is mostly done by means of algorithms. Various algorithms from the field of machine learning can be broadly classified according to this scheme:

Furthermore, a distinction between batch learning, in which all input / output pairs are simultaneously present and continuous ( sequential ) learning, in which the structure of the network evolves at different times.

Software

  • GNU R is a feature available on many platforms, free statistical software with extensions to machine learning (eg rpart, random forest ) and data mining.
  • WEKA is a Java-based open source software for machine learning and data mining
  • KNIME - open source data mining, workflow and data pipelining software
  • RapidMiner ( formerly YALE ) is an easy- to-use and freely available tool for machine learning and data mining.
  • Shogun is an open source toolbox for kernel methods.
  • Shark is a free C library that implements a variety of machine learning methods.
  • Scikit -learn is a Python based on free library with many machine learning method and application examples.
199228
de