Cosine similarity

Cosine similarity is a measure of the similarity between two vectors. Here, the cosine of the angle between two vectors is determined. The cosine of the included angle of zero is one; for any other angle, the cosine of the included angle is less than one. It is therefore a measure of whether two vectors point approximately in the same direction.

Typical applications are found in the comparison of documents, multimedia objects, text mining, Data mining, the discovery of plagiarism in search engines or in cryptography in decrypting encrypted texts. By calculating the cosine similarity of the character placement vectors succeeded in 2011, the decryption of the Codex Copiale, a document in cipher.

Mathematics

The cosine of two vectors is determined by the standard scalar product:

The cosine similarity of two vectors a and b is the cosine of the included angle θ

Therefore, the cosine similarity ranges from -1 in the opposite direction exactly to 1, exactly the same direction. A value of 0 usually means independence ( orthogonality ). Intermediate values ​​indicate similarity or dissimilarity.

For text compare one takes as attribute - vectors a and b usually frequency - vectors of the document, the weight of which can never be negative. Therefore, the cosine similarity in this case is always between 0 and 1

486547
de