Jaccard index
The Jaccard coefficient or Jaccard index after the Swiss botanist Paul Jaccard (1868-1944) is a measure of the similarity of sets.
Definition
To calculate the Jaccard coefficient of two sets, one divides the number of common elements by the size of the union of:
For quantities
The closer the Jaccard coefficient is to 1, the greater the similarity between the sets. The minimum value of the Jaccard coefficient is 0
Example
The two volumes and have the Jaccard coefficient
Jaccard metric
From the Jaccard coefficient, the Jaccard metric can be derived. This metric is calculated according to the formula
General:
Applications
In the area of text mining, and in particular the duplicate detection, the Jaccard similarity is a well known measure of the similarity between two elements. Two strings are decomposed into tokens (for example, divided at the space or by the use of n-gram with ). The resulting amounts of string portions as described above is used to calculate the similarity of the two sets.
- Set theory