classification

Classification (from Latin classis, class ', and facere, to make, ',) is called the grouping of objects into classes (groups, quantities). Classification occurs in all areas of thought; in philosophy, however, the term " categorization " common ( see, inter alia, the categories of Aristotle ). A classification requires either the abstraction or the formation of a multi-layer structure: a complex ( see complexity). - In the semiotics of these two methods " class -forming " and are called " complex-forming superizing ".

Since only the classification of real information allows an orderly processing, the classification as a central component of many applications of computer science. There, the so-called Automatic classification is studied as the basis of pattern recognition scientifically.

In a classification, it may due to errors in the procedure and / or characteristics of the objects to be classified wrong decisions, so-called pinholes or misclassifications, come. To specify how sure one is on an assignment, it is therefore advisable to include any decision is an indication of their reliability.

  • 4.1 Unclear criteria
  • 4.2 Incorrect characteristics
  • 4.3 Smooth Transition
  • 4.4 inseparability
  • 4.5 outliers
  • 4.6 remaining objects

Distinction between

This section provides an article overarching overview of the key with the classification of related items.

The technical terms of classification are often inaccurate or even misused, though most have a clearly defined meaning. The linguistic confusion is further increased, some concepts that bear more than one name:

  • Class or category. A class summarizes things that satisfy a number of conditions. In a class of things are generally summarized, which are identical or similar in their characteristics.
  • Class boundaries, decision boundaries. In order to decide to which class an object belongs, are among the classes class boundaries - rarely also called decision boundaries - drawn. An object belongs to a class if it is within their class boundaries.
  • Classification, class system, systematics. The totality of all classes forms a classification, also called class system or system. Frequently used, special classifications often have their own name: thesaurus, ontology, directory, taxonomy. The classification is the end product of a classification; usually is not a distinction and classification and classification used interchangeably.
  • Classification. While in classification, the class boundaries are not created until the classification arranges objects into an existing class system. The distinction between classification and classification is purely German, other languages ​​can put both approaches together under the term classification.
  • Categorization. Classification and categorization are basically the same thing, under "Classification " we group but mathematics and engineering, the " categorization " Psychology and meaning together. Categorization may further comprise setting the classes.
  • Classifier, classifier. Classifier is called the entity which makes a classification or classification.
  • Classification methods. The classification method determines the procedure of the classifier. Often no distinction is made between classifier and classification method.
  • Assessment of a classifier: the quality of classification by a classifier or classification methods can be assessed using statistical methods.
  • Distinction between

Classification, class system, systematics

Classification

Class boundaries, decision boundaries

Classifier, classifier

Classification methods

Assessment of a classification

Importance

Classification is a fundamental and universal process, based on the numerous complex processes. Already simplest organisms outside world stimuli into classes such as " dangerous" and " safe " or " edible" and " inedible " divide and separate the important from the unimportant. In organisms with a nervous system first classification is already being done by the neuron "decides" whether a stimulus is subliminal and is ignored, or is above the threshold and is further processed.

People classify belonged sounds to words, popular forms of letters and symbols; Classification is the basis of any understanding. The ability of classifying is a prerequisite of concept formation and ultimately the intelligence. The article categorization ( cognitive science ) goes into more detail on this complex significance of the classification.

Automatic classification comes in many techniques used. For example, evaluate classifiers products on conveyor belts as "acceptable " or "poor " or computer tomographic recordings as "tumor" or " harmless ". Also, for Artificial Intelligence is classification of central interest.

The fundamental philosophical antonym for the classification logic or Subsumtionslogik is in the process of dialectical logic.

Method

A distinction is top-down and bottom-up approaches.

Top down

In the top- down approach, the process of classification is composed of three individual steps:

It is typical of the classification that a fixed number of target classes is specified, and it is only still waiting to determine their limits. For the determination of the number and type of classes the category Education is responsible.

The selection of meaningful features is essential for a successful classification, as with an increasing number of features, the number of required observations grows exponentially. In practice, the number of observations is fixed, however, which, from a certain point, the quality of the classifier with additional features decreases again (see also overfitting ).

For classification, it is therefore important to determine key characteristics. Various methods are used:

  • Ranking of features by determining the correlation
  • With the help of information theory
  • Feature selection method filter
  • Wrapper
  • Principal Component Analysis ( PCA)

The methods are different degrees of complexity depending on the application and provide satisfactory results, may need to be performed again the selection of features when the selection has been made unsuitable. Even less important features may in this case, so that not too few features may be selected in conjunction with some other features, play a crucial role in the classification.

Equally critical is the choice of the appropriate classification method and an efficient classifier.

Bottom up

This procedure is often done unconsciously, as the first language acquisition with its conceptions. Wilhelm Kamlah formulated:

"The language that is looking to adapt the one hand, the world and its intrusive structure, by the other hand the world a structure are only ... But that it ever gives a us already familiar world in which the always new individuals but of mostly as a case already encountered generally known, can not be explained from the language, but from the fact that in the world takes place the recurrence of matches itself ... "

Trouble

The following problems can occur when classifying:

Unclear criteria

If the conditions when an object belongs to a class and when not, not clearly defined, it is difficult to impossible to classify an object. This happens in the everyday use of the classification quite often: What criteria distinguish between good and evil? What conditions differ rock music of jazz? For an unambiguous classification clearly defined and objectively measurable criteria are needed. In order to achieve a clear formulation that mathematics is usually tried.

Wrong features

It is only possible to classify objects into classes when the considered features actually make a distinction of classes. For example, it is not possible to classify living things based on their hair color in the classes man and ape; the hair color has no significance on the class membership of an organism in general.

Smooth transitions

Smooth transitions between classes contradict the idea of ​​sharp class boundaries. For example, the class limits of Class red in the color spectrum are very difficult to define. In order to allow a classification a sharp dividing line be introduced artificially. Instead it is possible, operating through the use of fuzzy logic on these fuzzy sets, and a decision can be made sharp by the defuzzification. For smooth transitions in the field of language see fuzziness (language).

Inseparability

Inseparability mainly occurs when too few or meaningless characteristics are considered. The objects appear from this angle mixed up and a clear separation seems impossible. If you want some reference to the color, vary the size and weight of apples from oranges, so many apples and oranges in these features could be so similar that a clear separation is almost impossible. Although the features are selected meaningful, remains a gray area, in which the decision is uncertain.

Runaway

Unpredictable measurement error or unusually pronounced Single copies may cause an object to be classified wrong.

Remaining objects

At the end of classification, a group of remaining objects are left that fits into any of the existing classes and to which many can easily create a new class that would not make the entire classification system incoherent. Then an unsatisfactory residual category must be set up for these objects.

Trustworthiness of a decision (confidence)

Even if all the characteristics of an object are known, it may be misclassified under certain circumstances (except considering the class membership itself as a feature ). So you would classify example, usually a hazelnut as harmless, although they can kill people with allergies and, shot from a sling, is a dangerous missile. On the other hand, is not correctly classified as diseased or non- diseased each radiograph, as may be the image content no conclusion on the class membership. If a decision is enforced - and this is usually in the classification of the case - it may be questionable to the classification wrong due to such effects.

Therefore, modern classifiers give in addition to every decision a value that indicates the reliability (confidence) of the decision made. This measure is commonly referred to as reliability information. A big, red tomato would be classified as "mature" with high reliability, a medium red tomato with a few green spots also called " mature", but with lower reliability. The indication of the reliability of a decision has benefits when following the classification processing. An "unsafe" as edible fungus identified is not eaten, a "safe" as edible detected does.

In scenarios where an incorrect classification serious disadvantages of than none, it may be useful beyond, an additional class " not classifiable " introduce.

Classification representations

  • Decision Tree
  • Decision table
  • Attribute space
478533
de