Day65: Cosine similarity

Posted by csiu on April 30, 2017 | with: 100daysofcode
  by Celia Siu (in Portland)

Today I added cosine similarity as an additional option for calculating the nearest document in the Kickstarter project that I’m working on. The associated modifications of this task is found here.

Cosine similarity

In cosine similarity, the similarity between 2 points is measured by the cosine of the angle between them.

Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them… [In cosine similarity,] two vectors with the same orientation have a cosine similarity of 1, two vectors at 90° have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.

Cosine similarity in Python

In python, you can specify the cosine similarity between two vectors using pairwise_distances and cosine_similarity from the sklearn.metrics.pairwise module.

from sklearn.metrics.pairwise import pairwise_distances

pairwise_distances(X, Y, metric="cosine")

Other distance metrics you can use include: