C3_W2_Collaborative_RecSys_Assignment_吳恩達_中英

Practice lab: Collaborative Filtering Recommender Systems(實踐實驗室:協同過濾推薦系統)

In this exercise, you will implement collaborative filtering to build a recommender system for movies.
在本次實驗中，你將實現協同過濾來構建一個電影推薦系統。

Outline

1 - Notation(注釋)
2 - Recommender Systems(推薦系統)
3 - Movie ratings dataset(電影評分數據集)
4 - Collaborative filtering learning algorithm(協同過濾學習算法)
- 4.1 Collaborative filtering cost function(協同過濾代價函數)
  - Exercise 1
5 - Learning movie recommendations(學習電影推薦)
6 - Recommendations(推薦)
7 - Congratulations!

Packages

We will use the now familiar NumPy and Tensorflow Packages.
我們將使用熟悉的NumPy和Tensorflow包。

import numpy as np
import tensorflow as tf
from tensorflow import keras
from recsys_utils import *

1 - Notation(注釋)

General Notation	Description	Python (if any)
$r (i, j)$	scalar; = 1 if user j rated game i = 0 otherwise
$y (i, j)$	scalar; = rating given by user j on game i (if r(i,j) = 1 is defined)
$\mathbf{w}^{(j)}$	vector; parameters for user j
$b^{(j)}$	scalar; parameter for user j
$\mathbf{x}^{(i)}$	vector; feature ratings for movie i
$n_u$	number of users	num_users
$n_m$	number of movies	num_movies
$n$	number of features	num_features
$\mathbf{X}$	matrix of vectors $\mathbf{x}^{(i)}$	X
$\mathbf{W}$	matrix of vectors $\mathbf{w}^{(j)}$	W
$\mathbf{b}$	vector of bias parameters $b^{(j)}$	b
$\mathbf{R}$	matrix of elements $r (i, j)$	R

2 - Recommender Systems(推薦系統)

In this lab, you will implement the collaborative filtering learning algorithm and apply it to a dataset of movie ratings.
The goal of a collaborative filtering recommender system is to generate two vectors: For each user, a parameter vector that embodies the movie tastes of a user. For each movie, a feature vector of the same size which embodies some description of the movie. The dot product of the two vectors plus the bias term should produce an estimate of the rating the user might give to that movie.
在此實驗中，您將實現協同過濾學習算法，并將其應用于電影評分的數據集。
協同過濾推薦系統的目標是生成兩個向量：對于每個用戶，一個參數向量，它表達了用戶的觀影喜好。對于每部電影，一個特征向量的大小相同，它表達了有關電影的一些描述。兩個向量的點積加上偏置項應該產生一個估計值，即用戶可能會給該電影打多少分。

The diagram below details how these vectors are learned.
下圖詳細說明了如何學習這些向量。

在這里插入圖片描述

Existing ratings are provided in matrix form as shown. $Y$ contains ratings; 0.5 to 5 inclusive in 0.5 steps. 0 if the movie has not been rated. $R$ has a 1 where movies have been rated. Movies are in rows, users in columns. Each user has a parameter vector $w^{user}$ and bias. Each movie has a feature vector $x^{movie}$ . These vectors are simultaneously learned by using the existing user/movie ratings as training data. One training example is shown above: $\mathbf{w}^{(1)} \cdot \mathbf{x}^{(1)} + b^{(1)} = 4$ . It is worth noting that the feature vector $x^{movie}$ must satisfy all the users while the user vector $w^{user}$ must satisfy all the movies. This is the source of the name of this approach - all the users collaborate to generate the rating set.
現有額定值以矩陣形式提供，如下所示。 $Y$ 包含評級;0.5到5，包括0.5個步驟。0表示如果電影沒有評級。 $R$ 有一個1表示電影的評級。電影以行表示，用戶以列表示。每個用戶都有一個參數向量 $w^{user}$ 和偏差。每部電影有一個特征向量 $x^{movie}$ 。這些向量通過使用現有的用戶/電影評分作為訓練數據來同時學習。上面顯示了一個訓練示例: $\mathbf{w}^{(1)} \cdot \mathbf{x}^{(1)} + b^{(1)} = 4$ 。值得注意的是，特征向量 $x^{movie}$ 必須滿足所有用戶，而用戶向量 $w^{user}$ 必須滿足所有電影。這是該方法名稱的來源——所有用戶協作生成評級集。

在這里插入圖片描述

Once the feature vectors and parameters are learned, they can be used to predict how a user might rate an unrated movie. This is shown in the diagram above. The equation is an example of predicting a rating for user one on movie zero.
一旦學習了特征向量和參數，就可以用來預測用戶對未評級的電影的評分。這在上面給出的圖表中顯示。方程是一個預測用戶對電影0的評分的示例。

In this exercise, you will implement the function cofiCostFunc that computes the collaborative filtering
objective function. After implementing the objective function, you will use a TensorFlow custom training loop to learn the parameters for collaborative filtering. The first step is to detail the data set and data structures that will be used in the lab.
在本次練習中，您將實現函數cofiCostFunc，該函數計算協同過濾的目標函數。實現目標函數后，您將使用TensorFlow自定義訓練循環來學習協同過濾的參數。第一步是詳細說明將在實驗室中使用的數據集和數據結構。

3 - Movie ratings dataset(電影評分數據集)

The data set is derived from the MovieLens “ml-latest-small” dataset.
[F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. https://doi.org/10.1145/2827872]

The original dataset has 9000 movies rated by 600 users. The dataset has been reduced in size to focus on movies from the years since 2000. This dataset consists of ratings on a scale of 0.5 to 5 in 0.5 step increments. The reduced dataset has $n_u = 443$ users, and $n_m= 4778$ movies.
原始數據集有600個用戶評分的9000部電影。該數據集的大小已經縮小，以關注2000年以來的電影。該數據集由0.5到5的評分組成，以0.5步為增量。約簡后的數據集 $n_u = 443$ users， $n_m= 4778$ movies。

Below, you will load the movie dataset into the variables $Y$ and $R$ .
下面，你將加載電影數據集到變量 $Y$ 和 $R$ 中。

The matrix $Y$ (a $n_m \times n_u$ matrix) stores the ratings $y^{(i,j)}$ . The matrix $R$ is an binary-valued indicator matrix, where $R (i, j) = 1$ if user $j$ gave a rating to movie $i$ , and $R (i, j) = 0$ otherwise.
矩陣 $Y$ （一個 $n_m \times n_u$ 矩陣）存儲評分 $y^{(i,j)}$ 。矩陣 $R$ 是一個二進制值的指示矩陣，其中 $R (i, j) = 1$ 如果用戶 $j$ 給電影 $i$ 評分，否則 $R (i, j) = 0$ 。

Throughout this part of the exercise, you will also be working with the
在本部分練習中，您還將使用
matrices, $\mathbf{X}$ , $\mathbf{W}$ and $\mathbf{b}$ :

$\mathbf{X} = \begin{bmatrix} --- (\mathbf{x}^{(0)})^T --- \\ --- (\mathbf{x}^{(1)})^T --- \\ \vdots \\ --- (\mathbf{x}^{(n_m-1)})^T --- \\ \end{bmatrix} , \quad \mathbf{W} = \begin{bmatrix} --- (\mathbf{w}^{(0)})^T --- \\ --- (\mathbf{w}^{(1)})^T --- \\ \vdots \\ --- (\mathbf{w}^{(n_u-1)})^T --- \\ \end{bmatrix},\quad \mathbf{ b} = \begin{bmatrix} b^{(0)} \\ b^{(1)} \\ \vdots \\ b^{(n_u-1)} \\ \end{bmatrix}\quad$

The $i$ -th row of $\mathbf{X}$ corresponds to thefeature vector $x^{(i)}$ for the $i$ -th movie, and the $j$ -th row of $\mathbf{W}$ corresponds to one parameter vector $\mathbf{w}^{(j)}$ , for the $j$ -th user. Both $x^{(i)}$ and $\mathbf{w}^{(j)}$ are $n$ -dimensional vectors. For the purposes of this exercise, you will use $n = 10$ , and therefore, $\mathbf{x}^{(i)}$ and $\mathbf{w}^{(j)}$ have 10 elements. Correspondingly, $\mathbf{X}$ is a $n_m \times 10$ matrix and $\mathbf{W}$ is a $n_u \times 10$ matrix.
$\mathbf{X}$ 的 $i$ -第一行對應于特征向量 $x^{(i)}$ 表示第i部電影， $j$ 表示第j行 $\mathbf{W}$ 對應一個參數向量 $\mathbf{w}^{(j)}$ ，對于第j個用戶。 $x^{(i)}$ 和 $\mathbf{w}^{(j)}$ 都是 $n$ 維向量。為了本練習的目的，您將使用 $n = 10$ ，因此， $\mathbf{x}^{(i)}$ 和 $\mathbf{w}^{(j)}$ 有10個元素。相應地， $\mathbf{X}$ 是一個 $n_m \times 10$ 矩陣， $\mathbf{W}$ 是一個 $n_u \times 10$ 矩陣。

We will start by loading the movie ratings dataset to understand the structure of the data.
我們將從加載電影評分數據集開始，以了解數據結構。

We will load $Y$ and $R$ with the movie dataset.
我們將使用電影數據集加載 $Y$ 和 $R$ 。

We’ll also load $\mathbf{X}$ , $\mathbf{W}$ , and $\mathbf{b}$ with pre-computed values. These values will be learned later in the lab, but we’ll use pre-computed values to develop the cost model.
我們還會加載 $\mathbf{X}$ ， $\mathbf{W}$ 和 $\mathbf{b}$ ，使用預先計算的值。這些值將在實驗后期學習，但我們使用預先計算的值來發展成本模型。

#Load data
X, W, b, num_movies, num_features, num_users = load_precalc_params_small()
Y, R = load_ratings_small()print("Y", Y.shape, "R", R.shape)
print("X", X.shape)
print("W", W.shape)
print("b", b.shape)
print("num_features", num_features)
print("num_movies",   num_movies)
print("num_users",    num_users)

Y (4778, 443) R (4778, 443)
X (4778, 10)
W (443, 10)
b (1, 443)
num_features 10
num_movies 4778
num_users 443

#  From the matrix, we can compute statistics like average rating.
tsmean =  np.mean(Y[0, R[0, :].astype(bool)])
print(f"Average rating for movie 1 : {tsmean:0.3f} / 5" )

Average rating for movie 1 : 3.400 / 5

4 - Collaborative filtering learning algorithm(協同過濾算法)

Now, you will begin implementing the collaborative filtering learning
algorithm. You will start by implementing the objective function.
現在，你將開始實現協同過濾學習算法。你將從實現目標函數開始。

The collaborative filtering algorithm in the setting of movie recommendations considers a set of $n$ -dimensional parameter vectors
電影推薦設置中的協同過濾算法考慮一組n維參數向量
$\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)}$ , $\mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$ and $b^{(0)},...,b^{(n_u-1)}$

where the model predicts the rating for movie $i$ by user $j$ as
該模型預測用戶 $j$ as對電影 $i$ 的評分
$y^{(i,j)} = \mathbf{w}^{(j)}\cdot \mathbf{x}^{(i)} + b^{(i)}$

Given a dataset that consists of a set of ratings produced by some users on some movies, you wish to learn the parameter vectors $\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)}, \mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$ and $b^{(0)},...,b^{(n_u-1)}$ that produce the best fit (minimizes
the squared error).
給定一個數據集，它由一些用戶對一些電影給出的評分組成，你希望學習參數向量 $\mathbf{x}^{(0)},...,\mathbf{x}^{(n_m-1)}, \mathbf{w}^{(0)},...,\mathbf{w}^{(n_u-1)}$ 和 $b^{(0)},...,b^{(n_u-1)}$ ，以產生最佳擬合（最小化平方誤差）。

You will complete the code in cofiCostFunc to compute the cost function for collaborative filtering.
你將在cofiCostFunc中完成代碼以計算協同過濾的成本函數。

4.1 Collaborative filtering cost function(協同過濾代價函數)

The collaborative filtering cost function is given by
協同過濾代價函數由以下公式給出：
$$J({\mathbf{x}^{{(0)},…,\mathbf{x}}{(n_m-1)},\mathbf{w}^{(0)},b{(0)},…,\mathbf{w}^{(n_u-1)},b{(n_u-1)}})= \frac{1}{2}\sum_{(i,j):r(i,j)=1}(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})2
+\underbrace{
\frac{\lambda}{2}
\sum_{j=0}^{{n_u-1}\sum_{k=0}}{n-1}(\mathbf{w}^{(j)}_k)2

\frac{\lambda}{2}\sum_{i=0}^{{n_m-1}\sum_{k=0}}{n-1}(\mathbf{x}k^{(i)})2
}{regularization}
\tag{1}$$
The first summation in (1) is “for all $i$ , $j$ where $r (i, j)$ equals $1$ ” and could be written:
(1)中的第一個求和是“對于所有 $i$ ， $j$ ，其中 $r (i, j)$ 等于 $1$ ”，可以寫成:

$\frac{1}{2}\sum_{j=0}^{n_u-1} \sum_{i=0}^{n_m-1}r(i,j)*(\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)} - y^{(i,j)})^2 +\text{regularization}$

You should now write cofiCostFunc (collaborative filtering cost function) to return this cost.
你現在應該編寫cofiCostFunc（協同過濾代價函數）來返回這個代價。

Exercise 1

For loop Implementation(for循環實現):
Start by implementing the cost function using for loops.
首先使用for循環實現代價函數。
Consider developing the cost function in two steps. First, develop the cost function without regularization. A test case that does not include regularization is provided below to test your implementation. Once that is working, add regularization and run the tests that include regularization. Note that you should be accumulating the cost for user $j$ and movie $i$ only if $R (i, j) = 1$ .
考慮在兩個步驟中實現代價函數。首先，在不包含正則化的代價函數。一個不包含正則化的測試用例是提供的，以測試你的實現。一旦它工作正常，添加正則化并運行包含正則化的測試用例。請注意，您應該只累積用戶 $j$ 和電影 $i$ 的成本，如果 $R (i, j) = 1$ 。

# GRADED FUNCTION: cofi_cost_func
# UNQ_C1def cofi_cost_func(X, W, b, Y, R, lambda_):"""Returns the cost for the content-based filteringArgs:X (ndarray (num_movies,num_features)): matrix of item featuresW (ndarray (num_users,num_features)) : matrix of user parametersb (ndarray (1, num_users)            : vector of user parametersY (ndarray (num_movies,num_users)    : matrix of user ratings of moviesR (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th userlambda_ (float): regularization parameterReturns:J (float) : Cost"""nm, nu = Y.shapeJ = 0### START CODE HERE ###  for j in range(nu):w = W[j,:]b_j = b[0,j]for i in range(nm):x = X[i,:]r = R[i,j]y = Y[i,j]J += np.square(r * (np.dot(w,x) + b_j - y))  J +=lambda_ * (np.sum(np.square(W)) + np.sum(np.square(X)))J = J/2   ### END CODE HERE ### return J

# Public tests
from public_tests import *
test_cofi_cost_func(cofi_cost_func);

[92mAll tests passed!

Click for hints You can structure the code in two for loops similar to the summation in (1). Implement the code without regularization first. Note that some of the elements in (1) are vectors. Use np.dot(). You can also use np.square(). Pay close attention to which elements are indexed by i and which are indexed by j. Don't forget to divide by two.

    ### START CODE HERE ###  for j in range(nu):for i in range(nm):### END CODE HERE ###

Click for more hints

Here is some more details. The code below pulls out each element from the matrix before using it. 
One could also reference the matrix directly.  
This code does not contain regularization.

    nm,nu = Y.shapeJ = 0### START CODE HERE ###  for j in range(nu):w = W[j,:]b_j = b[0,j]for i in range(nm):x = y = r =J += J = J/2### END CODE HERE ###

Last Resort (full non-regularized implementation)

    nm,nu = Y.shapeJ = 0### START CODE HERE ###  for j in range(nu):w = W[j,:]b_j = b[0,j]for i in range(nm):x = X[i,:]y = Y[i,j]r = R[i,j]J += np.square(r * (np.dot(w,x) + b_j - y ) )J = J/2### END CODE HERE ###

regularization Regularization just squares each element of the W array and X array and them sums all the squared elements. You can utilize np.square() and np.sum(). regularization details

    J += lambda_* (np.sum(np.square(W)) + np.sum(np.square(X)))

# Reduce the data set size so that this runs faster
num_users_r = 4
num_movies_r = 5 
num_features_r = 3X_r = X[:num_movies_r, :num_features_r]
W_r = W[:num_users_r,  :num_features_r]
b_r = b[0, :num_users_r].reshape(1,-1)
Y_r = Y[:num_movies_r, :num_users_r]
R_r = R[:num_movies_r, :num_users_r]# Evaluate cost function
J = cofi_cost_func(X_r, W_r, b_r, Y_r, R_r, 0);
print(f"Cost: {J:0.2f}")

Cost: 13.67

Expected Output (lambda = 0):
$13.67$ .

# Evaluate cost function with regularization 
J = cofi_cost_func(X_r, W_r, b_r, Y_r, R_r, 1.5)
print(f"Cost (with regularization): {J:0.2f}")

Cost (with regularization): 28.09

Expected Output:

28.09

Vectorized Implementation(向量化實現)

It is important to create a vectorized implementation to compute $J$ , since it will later be called many times during optimization. The linear algebra utilized is not the focus of this series, so the implementation is provided. If you are an expert in linear algebra, feel free to create your version without referencing the code below.
創建一個矢量化實現來計算 $J$ 是很重要的，因為它將在優化過程中被多次調用。所使用的線性代數不是本系列的重點，因此提供了實現。如果您是線性代數方面的專家，可以隨意創建自己的版本，而無需引用下面的代碼。

Run the code below and verify that it produces the same results as the non-vectorized version.
運行下面的代碼，并驗證它產生了與非矢量化版本相同的結果。

def cofi_cost_func_v(X, W, b, Y, R, lambda_):"""Returns the cost for the content-based filteringVectorized for speed. Uses tensorflow operations to be compatible with custom training loop.Args:X (ndarray (num_movies,num_features)): matrix of item featuresW (ndarray (num_users,num_features)) : matrix of user parametersb (ndarray (1, num_users)            : vector of user parametersY (ndarray (num_movies,num_users)    : matrix of user ratings of moviesR (ndarray (num_movies,num_users)    : matrix, where R(i, j) = 1 if the i-th movies was rated by the j-th userlambda_ (float): regularization parameterReturns:J (float) : Cost"""j = (tf.linalg.matmul(X, tf.transpose(W)) + b - Y)*RJ = 0.5 * tf.reduce_sum(j**2) + (lambda_/2) * (tf.reduce_sum(X**2) + tf.reduce_sum(W**2))return J

# Evaluate cost function
J = cofi_cost_func_v(X_r, W_r, b_r, Y_r, R_r, 0);
print(f"Cost: {J:0.2f}")# Evaluate cost function with regularization 
J = cofi_cost_func_v(X_r, W_r, b_r, Y_r, R_r, 1.5);
print(f"Cost (with regularization): {J:0.2f}")

Cost: 13.67
Cost (with regularization): 28.09

Expected Output:
Cost: 13.67
Cost (with regularization): 28.09

5 - Learning movie recommendations(學習電影推薦)

After you have finished implementing the collaborative filtering cost function, you can start training your algorithm to make movie recommendations for yourself.
在你完成協同過濾成本函數的實現后，你可以開始訓練你的算法來為你制作電影推薦。

In the cell below, you can enter your own movie choices. The algorithm will then make recommendations for you! We have filled out some values according to our preferences, but after you have things working with our choices, you should change this to match your tastes.
A list of all movies in the dataset is in the file movie list.
在下面的單元格中，你可以輸入你自己的電影選擇。然后，算法將為你制作推薦！我們已經根據我們的喜好填寫了一些值，但當你使用我們的選擇時，你應該改變這些來匹配你的品味。
數據集中所有電影的列表在文件電影列表中。

movieList, movieList_df = load_Movie_List_pd()my_ratings = np.zeros(num_movies)          #  Initialize my ratings# Check the file small_movie_list.csv for id of each movie in our dataset
# For example, Toy Story 3 (2010) has ID 2700, so to rate it "5", you can set
my_ratings[2700] = 5 #Or suppose you did not enjoy Persuasion (2007), you can set
my_ratings[2609] = 2# We have selected a few movies we liked / did not like and the ratings we
# gave are as follows:
my_ratings[929]  = 5   # Lord of the Rings: The Return of the King, The
my_ratings[246]  = 3   # Shrek (2001)
my_ratings[2716] = 4   # Inception
my_ratings[1150] = 3   # Incredibles, The (2004)
my_ratings[382]  = 2   # Amelie (Fabuleux destin d'Amélie Poulain, Le)
my_ratings[366]  = 5   # Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
my_ratings[622]  = 5   # Harry Potter and the Chamber of Secrets (2002)
my_ratings[988]  = 3   # Eternal Sunshine of the Spotless Mind (2004)
my_ratings[2925] = 1   # Louis Theroux: Law & Disorder (2008)
my_ratings[2937] = 1   # Nothing to Declare (Rien à déclarer)
my_ratings[793]  = 5   # Pirates of the Caribbean: The Curse of the Black Pearl (2003)
my_rated = [i for i in range(len(my_ratings)) if my_ratings[i] > 0]print('\nNew user ratings:\n')
for i in range(len(my_ratings)):if my_ratings[i] > 0 :print(f'Rated {my_ratings[i]} for  {movieList_df.loc[i,"title"]}')

New user ratings:Rated 3.0 for  Shrek (2001)
Rated 5.0 for  Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
Rated 2.0 for  Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)
Rated 5.0 for  Harry Potter and the Chamber of Secrets (2002)
Rated 5.0 for  Pirates of the Caribbean: The Curse of the Black Pearl (2003)
Rated 5.0 for  Lord of the Rings: The Return of the King, The (2003)
Rated 3.0 for  Eternal Sunshine of the Spotless Mind (2004)
Rated 3.0 for  Incredibles, The (2004)
Rated 2.0 for  Persuasion (2007)
Rated 5.0 for  Toy Story 3 (2010)
Rated 4.0 for  Inception (2010)
Rated 1.0 for  Louis Theroux: Law & Disorder (2008)
Rated 1.0 for  Nothing to Declare (Rien à déclarer) (2010)

Now, let’s add these reviews to $Y$ and $R$ and normalize the ratings.
現在，讓我們將這些評論添加到 $Y$ 和 $R$ 中，并標準化評分。

# Reload ratings and add new ratings
Y, R = load_ratings_small()
Y    = np.c_[my_ratings, Y]
R    = np.c_[(my_ratings != 0).astype(int), R]# Normalize the Dataset
Ynorm, Ymean = normalizeRatings(Y, R)

Let’s prepare to train the model. Initialize the parameters and select the Adam optimizer.
讓我們準備訓練模型。初始化參數并選擇Adam優化器。

#  Useful Values
num_movies, num_users = Y.shape
num_features = 100# Set Initial Parameters (W, X), use tf.Variable to track these variables
tf.random.set_seed(1234) # for consistent results
W = tf.Variable(tf.random.normal((num_users,  num_features),dtype=tf.float64),  name='W')
X = tf.Variable(tf.random.normal((num_movies, num_features),dtype=tf.float64),  name='X')
b = tf.Variable(tf.random.normal((1,          num_users),   dtype=tf.float64),  name='b')# Instantiate an optimizer.
optimizer = keras.optimizers.Adam(learning_rate=1e-1)

Let’s now train the collaborative filtering model. This will learn the parameters $\mathbf{X}$ , $\mathbf{W}$ , and $\mathbf{b}$ .
讓我們現在訓練協同過濾模型。這將學習參數 $\mathbf{X}，\mathbf{W}$ 和 $\mathbf{b}$ 。

The operations involved in learning $w$ , $b$ , and $x$ simultaneously do not fall into the typical ‘layers’ offered in the TensorFlow neural network package. Consequently, the flow used in Course 2: Model, Compile(), Fit(), Predict(), are not directly applicable. Instead, we can use a custom training loop.
同時學習 $w$ ， $b$ 和 $x$ 所涉及的操作不屬于TensorFlow神經網絡包中提供的典型“層”。因此，課程2中使用的流程:Model、Compile()、Fit()、Predict()并不直接適用。相反，我們可以使用定制的訓練循環。

Recall from earlier labs the steps of gradient descent.
回想一下之前的實驗中關于梯度下降的步驟。

repeat until convergence(重復直到收斂):
- compute forward pass(計算前向傳播)
- compute the derivatives of the loss relative to parameters(計算相對于參數的損失函數的導數)
- update the parameters using the learning rate and the computed derivatives(使用學習率和計算出的導數更新參數)

TensorFlow has the marvelous capability of calculating the derivatives for you. This is shown below. Within the tf.GradientTape() section, operations on Tensorflow Variables are tracked. When tape.gradient() is later called, it will return the gradient of the loss relative to the tracked variables. The gradients can then be applied to the parameters using an optimizer.
TensorFlow具有神奇的 capability，可以為您計算導數。這是下面顯示的。在tf.GradientTape()部分，跟蹤Tensorflow Variables的操作。當稍后的tape.gradient()被調用時，它將返回相對于跟蹤變量的損失函數的梯度。這些梯度可以隨后應用于使用優化器的參數。

This is a very brief introduction to a useful feature of TensorFlow and other machine learning frameworks. Further information can be found by investigating “custom training loops” within the framework of interest.
這是對TensorFlow和其他機器學習框架的一個有用特性的非常簡短的介紹。通過在感興趣的框架內調查“自定義訓練循環”可以找到進一步的信息。。

iterations = 200
lambda_ = 1
for iter in range(iterations):# Use TensorFlow’s GradientTape# to record the operations used to compute the cost with tf.GradientTape() as tape:# Compute the cost (forward pass included in cost)cost_value = cofi_cost_func_v(X, W, b, Ynorm, R, lambda_)# Use the gradient tape to automatically retrieve# the gradients of the trainable variables with respect to the lossgrads = tape.gradient( cost_value, [X,W,b] )# Run one step of gradient descent by updating# the value of the variables to minimize the loss.optimizer.apply_gradients( zip(grads, [X,W,b]) )# Log periodically.if iter % 20 == 0:print(f"Training loss at iteration {iter}: {cost_value:0.1f}")

Training loss at iteration 0: 2321158.0
Training loss at iteration 20: 136165.9
Training loss at iteration 40: 51862.0
Training loss at iteration 60: 24597.9
Training loss at iteration 80: 13629.8
Training loss at iteration 100: 8487.3
Training loss at iteration 120: 5807.5
Training loss at iteration 140: 4311.5
Training loss at iteration 160: 3435.2
Training loss at iteration 180: 2902.1

6 - Recommendations(推薦)

Below, we compute the ratings for all the movies and users and display the movies that are recommended. These are based on the movies and ratings entered as my_ratings[] above. To predict the rating of movie $i$ for user $j$ , you compute $\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)}$ . This can be computed for all ratings using matrix multiplication.
下面，我們計算所有電影的評分，并顯示推薦的電影。這些是基于上面輸入的my_ratings[]的電影和評分。為了預測用戶 $j$ 對電影 $i$ 的評分，你可以計算 $\mathbf{w}^{(j)} \cdot \mathbf{x}^{(i)} + b^{(j)}$ 。可以使用矩陣乘法計算所有評分的預測。

# Make a prediction using trained weights and biases
p = np.matmul(X.numpy(), np.transpose(W.numpy())) + b.numpy()#restore the mean
pm = p + Ymeanmy_predictions = pm[:,0]# sort predictions
ix = tf.argsort(my_predictions, direction='DESCENDING')for i in range(17):j = ix[i]if j not in my_rated:print(f'Predicting rating {my_predictions[j]:0.2f} for movie {movieList[j]}')print('\n\nOriginal vs Predicted ratings:\n')
for i in range(len(my_ratings)):if my_ratings[i] > 0:print(f'Original {my_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {movieList[i]}')

Predicting rating 4.51 for movie Lord of the Rings: The Two Towers, The (2002)
Predicting rating 4.44 for movie Dark Knight Rises, The (2012)
Predicting rating 4.39 for movie Particle Fever (2013)
Predicting rating 4.39 for movie Eichmann (2007)
Predicting rating 4.39 for movie Battle Royale 2: Requiem (Batoru rowaiaru II: Chinkonka) (2003)
Predicting rating 4.39 for movie Into the Abyss (2011)
Predicting rating 4.37 for movie My Sassy Girl (Yeopgijeogin geunyeo) (2001)
Predicting rating 4.37 for movie Bitter Lake (2015)
Predicting rating 4.37 for movie L.A. Slasher (2015)
Predicting rating 4.36 for movie Rivers and Tides (2001)
Predicting rating 4.36 for movie Loving Vincent (2017)
Predicting rating 4.36 for movie My Love (2006)Original vs Predicted ratings:Original 3.0, Predicted 3.05 for Shrek (2001)
Original 5.0, Predicted 4.80 for Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
Original 2.0, Predicted 2.10 for Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)
Original 5.0, Predicted 4.83 for Harry Potter and the Chamber of Secrets (2002)
Original 5.0, Predicted 4.84 for Pirates of the Caribbean: The Curse of the Black Pearl (2003)
Original 5.0, Predicted 4.86 for Lord of the Rings: The Return of the King, The (2003)
Original 3.0, Predicted 3.02 for Eternal Sunshine of the Spotless Mind (2004)
Original 3.0, Predicted 3.10 for Incredibles, The (2004)
Original 2.0, Predicted 2.09 for Persuasion (2007)
Original 5.0, Predicted 4.73 for Toy Story 3 (2010)
Original 4.0, Predicted 3.94 for Inception (2010)
Original 1.0, Predicted 1.38 for Louis Theroux: Law & Disorder (2008)
Original 1.0, Predicted 1.24 for Nothing to Declare (Rien à déclarer) (2010)

In practice, additional information can be utilized to enhance our predictions. Above, the predicted ratings for the first few hundred movies lie in a small range. We can augment the above by selecting from those top movies, movies that have high average ratings and movies with more than 20 ratings. This section uses a Pandas data frame which has many handy sorting features.
在實踐中，可以利用額外的信息來增強我們的預測。上面，對前幾百部電影的預測評分在一個小范圍內。我們可以通過從那些頂級電影、平均評分高的電影和評分超過20的電影中選擇電影來增強上述內容。本節使用一個Pandas數據框架，它有許多方便的排序特性。

filter=(movieList_df["number of ratings"] > 20)
movieList_df["pred"] = my_predictions
movieList_df = movieList_df.reindex(columns=["title", "number of ratings","mean rating",  "pred"])
movieList_df.loc[ix[:300]].loc[filter].sort_values("mean rating", ascending=False)

	title	number of ratings	mean rating	pred
929	Lord of the Rings: The Return of the King, The...	185	4.118919	4.856171
2700	Toy Story 3 (2010)	55	4.109091	4.726026
393	Lord of the Rings: The Fellowship of the Ring,...	198	4.106061	4.171893
2716	Inception (2010)	143	4.066434	3.940508
848	Lost in Translation (2003)	74	4.033784	3.915533
653	Lord of the Rings: The Two Towers, The (2002)	188	4.021277	4.505254
1122	Shaun of the Dead (2004)	77	4.006494	4.066710
3083	Dark Knight Rises, The (2012)	76	3.993421	4.439322
2804	Harry Potter and the Deathly Hallows: Part 1 (...	47	3.989362	4.096199
1771	Casino Royale (2006)	81	3.944444	3.966460
2649	How to Train Your Dragon (2010)	53	3.943396	4.303658
174	Traffic (2000)	70	3.900000	4.001369
2455	Harry Potter and the Half-Blood Prince (2009)	58	3.887931	4.062298
2523	Zombieland (2009)	53	3.877358	4.025288
361	Monsters, Inc. (2001)	132	3.871212	3.967351
3014	Avengers, The (2012)	69	3.869565	4.022406
1930	Harry Potter and the Order of the Phoenix (2007)	58	3.862069	4.000611
151	Crouching Tiger, Hidden Dragon (Wo hu cang lon...	110	3.836364	3.875230
793	Pirates of the Caribbean: The Curse of the Bla...	149	3.778523	4.838517
366	Harry Potter and the Sorcerer's Stone (a.k.a. ...	107	3.761682	4.803115
622	Harry Potter and the Chamber of Secrets (2002)	102	3.598039	4.831402

7 - Congratulations!

You have implemented a useful recommender system!
你已經實現了一個有用的推薦系統！

使用Pytorch實現梯度更新，獲得更好推薦

首先導入我們需要的包

import torch
import numpy as np
from torch.autograd import Variable
from recsys_utils import *

加載數據集

movieList, movieList_df = load_Movie_List_pd()
X, W, b, num_movies, num_features, num_users = load_precalc_params_small()

定義協同過濾代價函數

def cofi_cost_func_v(X,W,b,Y,R,lambda_):j = (torch.matmul(X,torch.transpose(W,0,1)) + b - Y)*Rj = 0.5 * torch.sum(j**2) + (lambda_/2) * (torch.sum(W**2) + torch.sum(X**2))return j


my_ratings = np.zeros(num_movies)          # 初始化我的喜好程度# 檢查文件small_movie_list.csv以獲取數據集中每部電影的id
# 例如，《玩具總動員3》(2010)的ID為2700，所以你可以設置為“5”
my_ratings[2700] = 5 #或者假設你不喜歡勸導(2007)，你可以設置
my_ratings[2609] = 2# 我們選擇了一些我們喜歡/不喜歡的電影和我們的評級
# gave are as follows:
my_ratings[929]  = 5   # Lord of the Rings: The Return of the King, The
my_ratings[246]  = 3   # Shrek (2001)
my_ratings[2716] = 4   # Inception
my_ratings[1150] = 3   # Incredibles, The (2004)
my_ratings[382]  = 2   # Amelie (Fabuleux destin d'Amélie Poulain, Le)
my_ratings[366]  = 5   # Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
my_ratings[622]  = 5   # Harry Potter and the Chamber of Secrets (2002)
my_ratings[988]  = 3   # Eternal Sunshine of the Spotless Mind (2004)
my_ratings[2925] = 1   # Louis Theroux: Law & Disorder (2008)
my_ratings[2937] = 1   # Nothing to Declare (Rien à déclarer)
my_ratings[793]  = 5   # Pirates of the Caribbean: The Curse of the Black Pearl (2003)
my_rated = [i for i in range(len(my_ratings)) if my_ratings[i] > 0]print('\nNew user ratings:\n')
for i in range(len(my_ratings)):if my_ratings[i] > 0 :print(f'Rated {my_ratings[i]} for  {movieList_df.loc[i,"title"]}')

New user ratings:Rated 3.0 for  Shrek (2001)
Rated 5.0 for  Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
Rated 2.0 for  Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)
Rated 5.0 for  Harry Potter and the Chamber of Secrets (2002)
Rated 5.0 for  Pirates of the Caribbean: The Curse of the Black Pearl (2003)
Rated 5.0 for  Lord of the Rings: The Return of the King, The (2003)
Rated 3.0 for  Eternal Sunshine of the Spotless Mind (2004)
Rated 3.0 for  Incredibles, The (2004)
Rated 2.0 for  Persuasion (2007)
Rated 5.0 for  Toy Story 3 (2010)
Rated 4.0 for  Inception (2010)
Rated 1.0 for  Louis Theroux: Law & Disorder (2008)
Rated 1.0 for  Nothing to Declare (Rien à déclarer) (2010)

我們先介紹如下的函數：

torch.randn: 產生一個服從標準整正態分布的張量，張量內數據均值為0，方差為1，即為高斯白噪聲。
- torch.normal:torch.randn(mean=0, std=1, size): 產生一個服從離散正態分布的張量隨機數，可以指定均值和標準差。其中，標準差std是一個張量包含每個輸出元素相關的正態分布標準差
autograd.Variable: 是包的核心類，包裝張量，支持幾乎所有操作，并且能夠跟蹤和計算梯度。可以通過調用.backword()方法來自動計算所有梯度，并且可以通過.data屬性訪問原始張量。該變量的梯度會被累計到.grad上去
requires_grad = True該函數的作用是設置一個變量，當該變量被設置為True時，該變量將自動跟蹤梯度。

Y,R = load_ratings_small()
Y    = np.c_[my_ratings,Y]
R    = np.c_[(my_ratings != 0).astype(int) ,R]print(num_users,num_movies)Ynorm , Ymean = normalizeRatings(Y,R)Ynorm_1 = Ynorm[:,:-1]
R = R[:,:-1]#設置初始的參數(W,X)
W = Variable(torch.randn((num_users,num_features),dtype=torch.float32 ), requires_grad=True)
X = Variable(torch.randn((num_movies,num_features),dtype=torch.float32), requires_grad=True)
b = Variable(torch.randn((1,num_users),dtype=torch.float32), requires_grad=True)

443 4778

這里的grad如果想要深入了解autograd.grad

X,W,b,Ynorm_1,R = torch.tensor(X,dtype=torch.float32,requires_grad=True),torch.tensor(W,dtype=torch.float32,requires_grad=True),torch.tensor(b,dtype=torch.float32,requires_grad=True),torch.tensor(Ynorm_1,dtype=torch.float32,requires_grad=True),torch.tensor(R,dtype=torch.float32,requires_grad=True)# 定義一個函數，用于計算梯度
W.grad = torch.ones((num_users,num_features))
X.grad = torch.ones((num_movies,num_features))
b.grad = torch.ones((1,num_users))#設置優化器
learning_rate = 1e-1
optimizer = torch.optim.Adam([W,X,b], lr=learning_rate)

C:\Users\10766\AppData\Local\Temp\ipykernel_12960\1395874218.py:1: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).X,W,b,Ynorm_1,R = torch.tensor(X,dtype=torch.float32,requires_grad=True),torch.tensor(W,dtype=torch.float32,requires_grad=True),torch.tensor(b,dtype=torch.float32,requires_grad=True),torch.tensor(Ynorm_1,dtype=torch.float32,requires_grad=True),torch.tensor(R,dtype=torch.float32,requires_grad=True)

在上面的實驗中tensorflow使用tf.GradientTape用來計算梯度。這里進行簡單介紹：
TensorFlow會把tf.GradientTape上下文中執行的所有操作都記錄在一個tape上(tape)，然后基于這個磁帶和每次操作產生的導數，用反向微分法來計算梯度。

在Pytorch中對應的就是:

optimizer.zero_grad():清除優化器中所有變量的梯度，避免梯度爆炸
optimizer.step():執行梯度下降更新參數
loss.backwarod():反向傳播計算得到每個參數的梯度值

Pytorch實現如下。

iterations = 200
lambda_ = 1 
for iter in range(iterations):cost_value = cofi_cost_func_v(X,W,b,Ynorm_1,R,lambda_)optimizer.zero_grad()cost_value.backward()optimizer.step()if iter % 20 == 0:print(f"迭代時的訓練損失{iter}:{cost_value:.1f}")

迭代時的訓練損失0:268189.8
迭代時的訓練損失20:15754.9
迭代時的訓練損失40:9339.7
迭代時的訓練損失60:7042.8
迭代時的訓練損失80:6067.2
迭代時的訓練損失100:5589.9迭代時的訓練損失120:5330.3
迭代時的訓練損失140:5172.5
迭代時的訓練損失160:5069.0
迭代時的訓練損失180:4997.7

tensor.detach(): 從計算圖中脫離出來，返回一個新的tensor，新的tensor和原tensor共享內存(修改一個tensor的值，另一個也會改變)，但是不會進行梯度計算。在從tensor轉換成numpy時，如果轉換前面的tensor在計算圖里面(requires_grad=True),只能先進行detach操作，再轉換成numpy。

X = X.detach().numpy()
W = W.detach().numpy()
b = b.detach().numpy()


p = np.matmul(X, np.transpose(W)) + b#restore the mean
pm = p + Ymeanmy_predictions = pm[:,0]
my_predictions = torch.tensor(my_predictions,dtype=torch.float32)print(my_predictions.dtype)# sort predictions
ix = my_predictions.argsort(descending=True)for i in range(17):j = ix[i]if j not in my_rated:print(f'Predicting rating {my_predictions[j]:0.2f} for movie {movieList[j]}')print('\n\nOriginal vs Predicted ratings:\n')
for i in range(len(my_ratings)):if my_ratings[i] > 0:print(f'Original {my_ratings[i]}, Predicted {my_predictions[i]:0.2f} for {movieList[i]}')

torch.float32
Predicting rating 5.12 for movie Doctor Who: The Time of the Doctor (2013)
Predicting rating 5.09 for movie Black Mirror: White Christmas (2014)
Predicting rating 5.04 for movie Day of the Doctor, The (2013)
Predicting rating 5.00 for movie Harry Potter and the Order of the Phoenix (2007)
Predicting rating 5.00 for movie Harry Potter and the Deathly Hallows: Part 1 (2010)
Predicting rating 5.00 for movie John Wick (2014)
Predicting rating 4.92 for movie Dr. Horrible's Sing-Along Blog (2008)
Predicting rating 4.88 for movie Colourful (Karafuru) (2010)
Predicting rating 4.87 for movie Zombieland (2009)
Predicting rating 4.86 for movie Into the Forest of Fireflies' Light (2011)
Predicting rating 4.86 for movie Ponyo (Gake no ue no Ponyo) (2008)
Predicting rating 4.81 for movie Yi Yi (2000)
Predicting rating 4.80 for movie Deathgasm (2015)
Predicting rating 4.79 for movie Harry Potter and the Prisoner of Azkaban (2004)
Predicting rating 4.78 for movie Particle Fever (2013)
Predicting rating 4.76 for movie Indignation (2016)
Predicting rating 4.76 for movie I Am Not Your Negro (2017)Original vs Predicted ratings:Original 3.0, Predicted 3.32 for Shrek (2001)
Original 5.0, Predicted 4.44 for Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)
Original 2.0, Predicted 2.60 for Amelie (Fabuleux destin d'Amélie Poulain, Le) (2001)
Original 5.0, Predicted 4.76 for Harry Potter and the Chamber of Secrets (2002)
Original 5.0, Predicted 4.48 for Pirates of the Caribbean: The Curse of the Black Pearl (2003)
Original 5.0, Predicted 4.56 for Lord of the Rings: The Return of the King, The (2003)
Original 3.0, Predicted 3.17 for Eternal Sunshine of the Spotless Mind (2004)
Original 3.0, Predicted 3.74 for Incredibles, The (2004)
Original 2.0, Predicted 2.12 for Persuasion (2007)
Original 5.0, Predicted 4.46 for Toy Story 3 (2010)
Original 4.0, Predicted 3.88 for Inception (2010)
Original 1.0, Predicted 1.32 for Louis Theroux: Law & Disorder (2008)
Original 1.0, Predicted 1.17 for Nothing to Declare (Rien à déclarer) (2010)