Computer Sciencemedium
Content-Based Recommender System Built Using a Vector Space Model
Question
A content-based recommender system is built using a vector space model. Given document vector , compute the cosine similarity with other document vectors and rank the recommendations.
Computing cosine similarity in a vector space model for recommendations.
Similarity Formula
cos(θ)
cos(θ) = (A · B) / (||A|| · ||B||) — higher values = more similar
1Recall Cosine Similarity
Cosine similarity between vectors and is: where is the dot product and is the Euclidean norm. Values range from (opposite) to (identical direction).
Why Cosine Similarity?
In information retrieval, cosine similarity measures the angle between document vectors regardless of their magnitude. A long document and a short document with the same topic get a high score because only the **direction** matters, not the length.
2Compute the Norm of A
Given query vector :
3Compute Dot Products
For each candidate document vector , the dot product simplifies because the middle component of is zero:
| Document | Vector B | A·B | ||B|| | cos(θ) |
|---|---|---|---|---|
| Doc 1 | (1, 1, 0) | 1 | √2 | 0.500 |
| Doc 2 | (1, 0, 1) | 2 | √2 | 1.000 |
| Doc 3 | (0, 1, 1) | 1 | √2 | 0.500 |
| Doc 4 | (1, 1, 1) | 2 | √3 | 0.816 |
| Doc 5 | (0, 0, 1) | 1 | 1 | 0.707 |
4Rank by Similarity
Ranking documents by descending cosine similarity gives the recommendation order:
1st: Doc 2 (1,0,1)cos = 1.000
2nd: Doc 4 (1,1,1)cos = 0.816
3rd: Doc 5 (0,0,1)cos = 0.707
4th: Doc 1 (1,1,0)cos = 0.500
4th: Doc 3 (0,1,1)cos = 0.500
TOP RECOMMENDATIONDoc 2
5Key Concepts
TF-IDF Weighting
In practice, vector components use TF-IDF weights (Term Frequency × Inverse Document Frequency) rather than binary 0/1 values. This gives higher weight to terms that are distinctive to a document.
Sparsity Optimization
Real document vectors have thousands of dimensions (one per vocabulary term) but are extremely sparse. Inverted indices skip zero-valued dimensions, making cosine similarity fast even for large vocabularies.
Quiz
Test your understanding with these questions.
1
What is the Euclidean norm of vector ?
2
Cosine similarity ranges between which values?