Computer Sciencemedium

Content-Based Recommender System Built Using a Vector Space Model

Question

A content-based recommender system is built using a vector space model. Given document vector

A = (1, 0, 1)

, compute the cosine similarity with other document vectors and rank the recommendations.

Computing cosine similarity in a vector space model for recommendations.

Similarity Formula

cos(θ)

cos(θ) = (A · B) / (||A|| · ||B||) — higher values = more similar

1Recall Cosine Similarity

Cosine similarity between vectors

A

and

B

is:

\cos(\theta) = \frac{A \cdot B}{||A|| \cdot ||B||}

where

A \cdot B

is the dot product and

||A||

is the Euclidean norm. Values range from

-1

(opposite) to

1

(identical direction).

Why Cosine Similarity?

In information retrieval, cosine similarity measures the angle between document vectors regardless of their magnitude. A long document and a short document with the same topic get a high score because only the **direction** matters, not the length.

2Compute the Norm of A

Given query vector

A = (1, 0, 1)

||A|| = \sqrt{1^2 + 0^2 + 1^2} = \sqrt{2} \approx 1.414

3Compute Dot Products

For each candidate document vector

B_i

, the dot product simplifies because the middle component of

A

is zero:

A \cdot B_i = 1 \cdot b_1 + 0 \cdot b_2 + 1 \cdot b_3 = b_1 + b_3

Document	Vector B	A·B	\|\|B\|\|	cos(θ)
Doc 1	(1, 1, 0)	1	√2	0.500
Doc 2	(1, 0, 1)	2	√2	1.000
Doc 3	(0, 1, 1)	1	√2	0.500
Doc 4	(1, 1, 1)	2	√3	0.816
Doc 5	(0, 0, 1)	1	1	0.707

4Rank by Similarity

Ranking documents by descending cosine similarity gives the recommendation order:

1st: Doc 2 (1,0,1)cos = 1.000

2nd: Doc 4 (1,1,1)cos = 0.816

3rd: Doc 5 (0,0,1)cos = 0.707

4th: Doc 1 (1,1,0)cos = 0.500

4th: Doc 3 (0,1,1)cos = 0.500

TOP RECOMMENDATIONDoc 2

5Key Concepts

TF-IDF Weighting

In practice, vector components use TF-IDF weights (Term Frequency × Inverse Document Frequency) rather than binary 0/1 values. This gives higher weight to terms that are distinctive to a document.

Sparsity Optimization

Real document vectors have thousands of dimensions (one per vocabulary term) but are extremely sparse. Inverted indices skip zero-valued dimensions, making cosine similarity fast even for large vocabularies.

Quiz

Test your understanding with these questions.

What is the Euclidean norm of vector

A = (1, 0, 1)

Cosine similarity ranges between which values?