Last modified: May 28, 2024
This article is written in: 🇺🇸
Recommendation systems are a fundamental component in the interface between users and large-scale content providers like Amazon, eBay, and iTunes. These systems personalize user experiences by suggesting products, movies, or content based on past interactions and preferences.
Imagine a company that sells movies and allows users to rate them. Consider the following setup:
Notations:
Example for "Cute Puppies of Love":
$$ x^{(3)} = \begin{bmatrix} 1 \\ 0.99 \\ 0 \end{bmatrix} $$
Example for user 1 (Alice) and her preference for "Cute Puppies of Love":
$$ \theta^{(1)} = \begin{bmatrix} 0 \\ 5 \\ 0 \end{bmatrix} $$
Imagine a company that sells movies and allows users to rate them. Consider the following setup:
Notations:
Example for "Cute Puppies of Love":
$$ x^{(3)} = \begin{bmatrix} 1 \\ 0.99 \\ 0 \end{bmatrix} $$
Example for user 1 (Alice) and her preference for "Cute Puppies of Love":
$$ \theta^{(1)} = \begin{bmatrix} 0 \\ 5 \\ 0 \end{bmatrix} $$
This implementation demonstrates how to set up a basic collaborative filtering model for movie rating prediction using Python.
import numpy as np
# Number of users and movies
n_u = 4
n_m = 5
# User ratings matrix (0 if not rated)
ratings = np.array([
[5, 4, 0, 0, 3],
[0, 0, 4, 0, 4],
[0, 0, 5, 3, 3],
[3, 4, 0, 0, 5]
])
# Indicator matrix r(i, j) = 1 if rated, 0 otherwise
R = (ratings != 0).astype(int)
# Initialize random movie feature vectors (3 features per movie)
X = np.random.rand(n_m, 3)
# Initialize random user parameter vectors (3 parameters per user)
Theta = np.random.rand(n_u, 3)
def collaborative_filtering_cost(X, Theta, ratings, R, lambda_ = 0.1):
predictions = X.dot(Theta.T)
error = (predictions - ratings) * R
cost = 0.5 * np.sum(error**2)
# Regularization
cost += (lambda_ / 2) * (np.sum(Theta**2) + np.sum(X**2))
return cost
cost = collaborative_filtering_cost(X, Theta, ratings, R)
print(f'Initial cost: {cost}')
def collaborative_filtering_gradient(X, Theta, ratings, R, alpha=0.01, lambda_ = 0.1, iterations=1000):
X_grad = np.zeros(X.shape)
Theta_grad = np.zeros(Theta.shape)
for _ in range(iterations):
predictions = X.dot(Theta.T)
error = (predictions - ratings) * R
X_grad = error.dot(Theta) + lambda_ * X
Theta_grad = error.T.dot(X) + lambda_ * Theta
X -= alpha * X_grad
Theta -= alpha * Theta_grad
cost = collaborative_filtering_cost(X, Theta, ratings, R, lambda_)
if _ % 100 == 0:
print(f'Iteration {_}: cost = {cost}')
return X, Theta
X, Theta = collaborative_filtering_gradient(X, Theta, ratings, R)
To predict how much Alice might like "Cute Puppies of Love", we compute:
$$ (\theta^{(1)})^T x^{(3)} = (0 \cdot 1) + (5 \cdot 0.99) + (0 \cdot 0) = 4.95 $$
This predicts a high rating of 4.95 out of 5 for Alice for this movie, based on her preference parameters and the movie's features.
Here is an example implementation in Python:
import numpy as np
# Feature vector for the movie "Cute Puppies of Love"
x_cute_puppies = np.array([1, 0.99, 0])
# Parameter vector for Alice
theta_alice = np.array([0, 5, 0])
# Compute the dot product
predicted_rating = np.dot(theta_alice, x_cute_puppies)
# Print the predicted rating
print(f"Predicted rating for Alice for 'Cute Puppies of Love': {predicted_rating}")
When you run this code, it will output:
Predicted rating for Alice for 'Cute Puppies of Love': 4.95
In recommendation systems, learning user preferences $(\theta^j)$ is a key component. This process is akin to linear regression, but with a unique approach tailored for recommendation contexts.
$$ \min_{\theta^j} = \frac{1}{2m^{(j)}} \sum_{i:r(i,j)=1} \left((\theta^{(j)})^T(x^{(i)}) - y^{(i,j)}\right)^2 + \frac{\lambda}{2m^{(j)}} \sum_{k=1}^{n} (\theta_k^{(j)})^2 $$
Update Rule for $\theta_k^{(j)}$:
$$ \theta_k^{(j)} := \theta_k^{(j)} - \alpha \sum_{i:r(i,j)=1} \left((\theta^{(j)})^T(x^{(i)}) - y^{(i,j)}\right) x_k^{(i)} $$
$$ \theta_k^{(j)} := \theta_k^{(j)} - \alpha \left(\sum_{i:r(i,j)=1} \left((\theta^{(j)})^T(x^{(i)}) - y^{(i,j)}\right) x_k^{(i)} + \lambda \theta_k^{(j)}\right) $$
Collaborative filtering leverages the interplay between user preferences and item (movie) features:
$$ \min_{x^{(1)}, ..., x^{(n_m)}} \frac{1}{2} \sum_{i=1}^{n_m} \sum_{i:r(i,j)=1} \left((\theta^{(j)})^T(x^{(i)}) - y^{(i,j)}\right)^2 + \frac{\lambda}{2} \sum_{i=1}^{n_m} \sum_{k=1}^{n} (x_k^{(i)})^2 $$
Here, we'll implement a basic collaborative filtering model using gradient descent to predict user ratings for movies.
First, we need to set up our data, including user ratings and an indicator matrix showing which movies have been rated by which users.
import numpy as np
# Number of users and movies
n_u = 4
n_m = 5
# User ratings matrix (0 if not rated)
ratings = np.array([
[5, 4, 0, 0, 3],
[0, 0, 4, 0, 4],
[0, 0, 5, 3, 3],
[3, 4, 0, 0, 5]
])
# Indicator matrix r(i, j) = 1 if rated, 0 otherwise
R = (ratings != 0).astype(int)
We initialize random feature vectors for movies and parameter vectors for users.
# Initialize random movie feature vectors (3 features per movie)
X = np.random.rand(n_m, 3)
# Initialize random user parameter vectors (3 parameters per user)
Theta = np.random.rand(n_u, 3)
We define the cost function for collaborative filtering, including regularization.
def collaborative_filtering_cost(X, Theta, ratings, R, lambda_=0.1):
predictions = X.dot(Theta.T)
error = (predictions - ratings) * R
cost = 0.5 * np.sum(error ** 2)
# Regularization
cost += (lambda_ / 2) * (np.sum(Theta ** 2) + np.sum(X ** 2))
return cost
cost = collaborative_filtering_cost(X, Theta, ratings, R)
print(f'Initial cost: {cost}')
We implement gradient descent to optimize the feature and parameter vectors.
def collaborative_filtering_gradient(X, Theta, ratings, R, alpha=0.01, lambda_=0.1, iterations=1000):
for _ in range(iterations):
predictions = X.dot(Theta.T)
error = (predictions - ratings) * R
X_grad = error.dot(Theta) + lambda_ * X
Theta_grad = error.T.dot(X) + lambda_ * Theta
X -= alpha * X_grad
Theta -= alpha * Theta_grad
if _ % 100 == 0:
cost = collaborative_filtering_cost(X, Theta, ratings, R, lambda_)
print(f'Iteration {_}: cost = {cost}')
return X, Theta
X, Theta = collaborative_filtering_gradient(X, Theta, ratings, R)
Finally, we use the optimized feature and parameter vectors to predict ratings for all users and movies.
# Predict ratings for all users and movies
predictions = X.dot(Theta.T)
# Print predictions
print("Predicted ratings:")
print(predictions)
Example $[5 \times 4]$ Matrix Y for 5 movies and 4 users:
$$ Y = \begin{pmatrix} 5 & 5 & 0 & 0 \\ 5 & ? & ? & 0 \\ ? & 4 & 0 & ? \\ 0 & 0 & 5 & 4 \\ 0 & 0 & 5 & 0 \end{pmatrix} $$
$$ X \Theta^T = \begin{pmatrix} (\theta^{(1)})^T(x^{(1)}) & \dots & (\theta^{(n_u)})^T(x^{(1)}) \\ \vdots & \ddots & \vdots \\ (\theta^{(1)})^T(x^{(n_m)}) & \dots & (\theta^{(n_u)})^T(x^{(n_m)}) \end{pmatrix} $$
To implement the low-rank matrix factorization for collaborative filtering, we need to set up the necessary matrices and perform the matrix multiplication to predict ratings. Here's how to do it in Python.
Organize all user ratings into a matrix $Y$, which represents the interactions between users and movies.
import numpy as np
# Example [5 x 4] Matrix Y for 5 movies and 4 users
Y = np.array([
[5, 5, 0, 0],
[5, 0, 0, 0],
[0, 4, 0, 0],
[0, 0, 5, 4],
[0, 0, 5, 0]
])
print("Matrix Y:")
print(Y)
Matrix $X$ contains the features for each movie, and matrix $\Theta$ contains the user preferences.
# Number of movies (n_m) and users (n_u)
n_m, n_u = Y.shape
# Number of features
n_features = 3
# Initialize movie feature matrix X and user preference matrix Theta with random values
X = np.random.rand(n_m, n_features)
Theta = np.random.rand(n_u, n_features)
print("\nInitial Matrix X (Movie Features):")
print(X)
print("\nInitial Matrix Θ (User Preferences):")
print(Theta)
The predicted ratings matrix can be expressed as the product of matrices $X$ and $\Theta^T$.
# Compute the predicted ratings
predicted_ratings = X.dot(Theta.T)
print("\nPredicted Ratings Matrix XΘ^T:")
print(predicted_ratings)
This Python code demonstrates the setup of matrices $Y$, $X$, and $\Theta$, and the calculation of the predicted ratings matrix using matrix multiplication. The matrix $X$ contains the features for each movie, and $\Theta$ contains the user preferences. The predicted ratings matrix is obtained by multiplying $X$ with the transpose of $\Theta$.
Imagine a user in a movie recommendation system who hasn't rated any movies. This scenario presents a challenge for typical collaborative filtering algorithms.
Example $\mu$ vector for a system with 5 movies:
$$ \mu = \begin{bmatrix} 2.5 \\ 2.5 \\ 2 \\ 2.25 \\ 1.25 \end{bmatrix} $$
Normalized Ratings Matrix $Y$:
$$ Y = \begin{pmatrix} 2.5 & 2.5 & -2.5 & -2.5 & ? \\ 2.5 & ? & ? & -2.5 & ? \\ ? & 2 & -2 & ? & ? \\ -2.25 & -2.25 & 2.75 & 1.75 & ? \\ -1.25 & -1.25 & 3.75 & -1.25 & ? \end{pmatrix} $$
Imagine a user in a movie recommendation system who hasn't rated any movies. This scenario presents a challenge for typical collaborative filtering algorithms.
For such a user, there are no movies for which $r(i, j) = 1$. As a result, the algorithm ends up minimizing only the regularization term for this user, which doesn't provide meaningful insight for recommendations.
Example $\mu$ vector for a system with 5 movies:
$$ \mu = \begin{bmatrix} 2.5 \\ 2.5 \\ 2 \\ 2.25 \\ 1.25 \end{bmatrix} $$
Normalized Ratings Matrix $Y$:
$$ Y = \begin{pmatrix} 2.5 & 2.5 & -2.5 & -2.5 & ? \\ 2.5 & ? & ? & -2.5 & ? \\ ? & 2 & -2 & ? & ? \\ -2.25 & -2.25 & 2.75 & 1.75 & ? \\ -1.25 & -1.25 & 3.75 & -1.25 & ? \end{pmatrix} $$
import numpy as np
# Example [5 x 4] Matrix Y for 5 movies and 4 users
Y = np.array([
[5, 5, 0, 0],
[5, 0, 0, 0],
[0, 4, 0, 0],
[0, 0, 5, 4],
[0, 0, 5, 0]
])
# Compute the mean rating for each movie, ignoring zeros
mean_ratings = np.true_divide(Y.sum(1), (Y != 0).sum(1))
print("Mean Ratings Vector μ:")
print(mean_ratings)
# Subtract the mean rating for each movie from all its ratings
Y_normalized = Y - mean_ratings[:, np.newaxis]
# Ensure we do not subtract from places where rating was 0
Y_normalized[Y == 0] = 0
print("\nNormalized Ratings Matrix Y:")
print(Y_normalized)
For a new user with no ratings, their predicted rating for each movie can be initialized to the mean rating of that movie.
# Predicted ratings for a new user with no ratings
new_user_ratings = mean_ratings.copy()
print("\nPredicted Ratings for New User with No Ratings:")
print(new_user_ratings)
These notes are based on the free video lectures offered by Stanford University, led by Professor Andrew Ng. These lectures are part of the renowned Machine Learning course available on Coursera. For more information and to access the full course, visit the Coursera course page.