Today I added cosine similarity as an additional option for calculating the nearest document in the Kickstarter project that I’m working on. The associated modifications of this task is found here.
Cosine similarity
In cosine similarity, the similarity between 2 points is measured by the cosine of the angle between them.
Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of the angle between them… [In cosine similarity,] two vectors with the same orientation have a cosine similarity of 1, two vectors at 90° have a similarity of 0, and two vectors diametrically opposed have a similarity of -1, independent of their magnitude.
Cosine similarity in Python
In python, you can specify the cosine similarity between two vectors using pairwise_distances
and cosine_similarity
from the sklearn.metrics.pairwise module.
from sklearn.metrics.pairwise import pairwise_distances
pairwise_distances(X, Y, metric="cosine")
Other distance metrics you can use include: