Histogram

A histogram is a graphical representation of the frequency distribution characteristics of scaled metric. It requires the classification of data into classes (English bins) that can have a constant or variable width. It will be drawn by the width of each class representing their surface areas, the ( relative or absolute ) class frequencies directly adjacent rectangles. The height of each rectangle then represents the (relative or absolute ) frequency density, ie, the ( relative or absolute ) frequency divided by the width of the corresponding class.

Histograms can also be interpreted as an estimate of the probability density of a continuous random variable.

  • 6.1 Example of high-key and low-key Photography

Application

Find application in the descriptive statistics and histograms in image processing. We used histograms for example,

  • If you want to see the progression of the frequency distribution and not only summary data such as the mean and standard deviation,
  • If it is suspected that several factors influence a process, and you want to prove this,
  • If you want to define meaningful specification limits for a process.

In the physical or applied research areas (e.g., analysis) measured spectra are displayed as histograms, see, e.g., multi-channel analyzer.

Construction of a histogram

The following steps are necessary in the construction of a histogram of:

Division into classes

To construct a histogram of the range of values ​​of the sample into k contiguous intervals is divided, the so-called classes. It is important to ensure that the marginal classes are not open. That is, the first and last class must have a lower and upper limit. The classes do not have to be the same width. However, at least in the central area large classes simplify the interpretation of the same. On each class a rectangle is then set up, the surface of which is proportional to the frequency of each class. The histogram of these classes to the width of each rectangle correspond.

Determining the class frequency

When creating a histogram, there are two approaches: The class frequency reflects either an absolute or a relative value resists. The absolute value corresponds to the number of values ​​which belong to a class. The relative value, however, expresses how belonging to the percentage of the values ​​of a class. Depending on the application, can bring benefits both to work with absolute and relative values ​​. In the histogram, the class frequency corresponds to the area of ​​the rectangles.

Determine the incidence density

Since the area of the j-th rectangle is equal to the class frequency nj, the height of the rectangle, the so-called frequency density hj is the class frequency nj calculated as the ratio nj / dj by the class width dj. This is immediately apparent when you consider that the area of ​​a rectangle is the product of width ( class width ) and height ( incidence density). The class with the highest frequency density is called the modal class. Are the classes of equal width, so frequency density and absolute and relative frequencies are proportional to each other. The heights of the rectangles are comparable and in this case ( taking into account the class width as the proportionality factor ) be interpreted as a frequency.

Statistical fluctuation of the class frequency

Often the determined class frequencies will scatter when repeating the data collection. So turns out, for example, in an election forecast, the question of the precision of the numbers collected. The expected variation of the class frequency tends to in unlimited growing number of classes

Estimate of the number of the classes

To draw a histogram, has a sufficiently large number of measurements yield a meaningful course. An incorrect classification of the classes can lead to a misinterpretation of the histogram. For the determination of the number of classes or rectangles are different rules of thumb:

If necessary, you can calculate the number of bars even after the Sturges rule:

However, the Sturges rule should no longer be used because they do not take into account the scattering.

Alternatively, the class width to the rule according to Scott

Or rule by Freedman and Diaconis

Be calculated. Here, the standard deviation, the number of measurements and the interquartile range.

The rule according to Scott is defined only for normally distributed data. For other cases, Scott led a correction factors as a function of skewness and excess.

Properties

A histogram is a surface view of the present proportional frequency. The area of ​​a rectangle corresponds with the relative frequency of the class class and a proportionality factor.

Is equal to the sample size, ie, the area of each rectangle is equal to the absolute class frequency. The histogram is absolutely called in this case, in which the sum of the areas of the rectangles corresponds to the sample size n. Are used to construct the histogram the exact relative class frequencies used (), the histogram is designated as relative or normalized. Since the areas of the rectangles now correspond to the relative class frequencies, the surface areas add up in this case 1

In a histogram, the rectangles boundaries in contrast to the column chart directly to each other, ie there are no gaps between them. Because the width of the rectangles corresponding to the intervals formed (classes), which also directly adjoin each other.

In contrast to the bar graph, the x- axis must be a scale with a histogram, the values ​​are arranged and equally spaced.

Three characteristics of a histogram can be used for its assessment:

  • The general curve
  • The scattering
  • The centering

Example of histogram

There are 32 European countries as an indicator of prosperity, the numbers of cars per 1000 inhabitants before. Values ​​are divided into classes, as follows.

Using the table, we obtain the following histogram:

The abscissa shows the class boundaries and class means are removed. In general, you are in a histogram, the ordinate is not much, otherwise there is a risk, rather than its surface to interpret the height of a rectangle as frequency. However, all classes of equal width, one can for the height of the rectangles using the class frequency nj and this wear on the ordinate.

Average - Shifted Histogram

The left image shows four histograms for the same record. Although the class widths are in each histogram equal to 2.0, however, the beginning of the first class shifts ( -6.0, -5.5, -5.0 and -4.5 ). Although in both cases the same data set was used, multiple histograms come out yet.

In addition to the problem of class number and class width so the choice of the ( left ) class boundaries plays a role. David Scott has proposed the Average - Shifted histogram.

On the right, the four histograms were superimposed and then averaged the histogram heights for each value. This results in the average -shifted - histogram. Usually much more than four histograms are overlaid and averaged.

The Average - Shifted histogram does not solve the problem of the choice of ( left ) class boundaries, but the problem of choosing the optimal class widths.

Classify is the Average - Shifted histogram between the histogram and the kernel density estimate.

Average - Shifted histogram calculated from the superposition of the four individual histograms.

Histogram in image processing

In digital image processing is meant by a histogram, the statistical frequency of the gray values ​​or the color values ​​in an image. The histogram of an image provides an indication of the occurrence of gray or color values ​​and contrast and brightness of the image. In a color image can be created either on the individual color channels of a histogram of all possible colors or histograms. The latter is usually better, since most methods are based on gray-level images and therefore further processing is immediately possible. The number of the color channels in an image depends on the mode, i.e. per color separation, there is a channel. Therefore CMYK images have four color channels, RGB color images only three.

A histogram visualizes the distribution of the brightness values ​​of an image. About an axis that represents the value range of the color values ​​of the individual frequencies of occurrence of the color values ​​are plotted as a bar. The higher the bar over a color value, the more often this color value occurs in the image.

Histograms are frequently found in the field of digital photography. Well-equipped digital cameras show on display during the motif search as an aid to a more balanced picture in real time or for previously saved recordings on the histogram. Viewing a histogram allows the photographer to more accurately control the outcome or the planned photo, as it allows the camera display. For example, one can recognize the typical errors such as under-and over-exposure and resolve them through appropriate exposure compensation. Since the brightness and especially the contrast range of the image during later processing and recycling play a major role, it is worth while photographing to pay attention to the histogram display.

A classic application of histograms in image processing is the equalization ( Äqualisierung, Eng. Equalizing ), in which the histogram is transformed with a disguising. This allows a better distribution of the coloring can be achieved, which goes beyond a mere contrast enhancement.

Example, high-key and low-key Photography

In low-key shots to focus the details in low tones. The rash is therefore most at the bottom. ( There are many pixels with low tones before. )

For high-key shots to the contrary, so many pixels with high tonal values ​​and hardly a rash in the low tonal values ​​applies.

In overexposed shots " hugs " the probability curve to the right ( bright ) side and the maximum is possibly not achieved. It will therefore not reproduced all the bright details, since a certain brightness range is cut off and is defined below as white.

History

Well first dipped a histogram 1786 in the work " The Commercial and Political Atlas " by the 1800 Scottish engineer and economist William Playfair on, the previously also introduced the bar and pie chart. In 1833, the Frenchman André- Michel Guerry used histograms for the visualization of data. The histogram was further developed by the Belgian statistician and social scientist Adolphe Quetelet around 1846., The term " histogram" (historical diagrams), but was first used by the English mathematician Karl Pearson in 1891 in a series of lectures and finally introduced in 1895 in its present meaning.

93040
de