Python推荐系统库--Surprise理论

2023年4月29日 907次阅读来源: 墨麟非攻

Surprise

Surprise是scikit系列中的一个。Surprise的User Guide有详细的解释和说明

支持多种推荐算法

基础算法/baseline algorithms

基于近邻方法（协同过滤）/neighborhood methods

矩阵分解方法/matrix factorization-based (SVD, PMF, SVD++, NMF)

下面介绍几种算法

基础算法：

　　1. random_pred.NormalPredictor

　　说明：Algorithm predicting a random rating based on the distribution of the training set, which is assumed to be normal.

　　意思是：算法基于训练集的分布预测随机等级，假设该分布为正态分布

　　2. baseline_only.BaselineOnly

　　说明：Algorithm predicting the baseline estimate for given user and item.

　　意思是：算法预测给定用户和项目的基线估计

协同过滤算法：

　　3. knns.KNNBasic

　　说明：A basic collaborative filtering algorithm.

　　意思是：一种基本的协同过滤算法

　　4. knns.KNNWithMeans

　　说明：A basic collaborative filtering algorithm, taking into account the mean ratings of each user.

　　意思是：一个基本的协同过滤算法，考虑到每个用户的平均评分

　　5. knns.KNNBaseline

　　说明：A basic collaborative filtering algorithm taking into account a baseline rating.

　　意思是：一种基本的协同过滤算法考虑到基准评分

矩阵分解方法：

　　6. matrix_factorization.SVD

　　说明：The famous SVD algorithm, as popularized by Simon Funk during the Netflix Prize.

　　意思是：著名的SVD算法

　　7. matrix_factorization.SVDpp

　　说明：The SVD++ algorithm, an extension of SVD taking into account implicit ratings.

　　意思是：SVD++算法，SVD的一个扩展，考虑到隐式评级

　　8. matrix_factorization.NMF

　　说明：A collaborative filtering algorithm based on Non-negative Matrix Factorization.

　　意思是：一种基于非负矩阵的协同过滤算法

　　9. slope_one.SlopeOne

　　说明：A simple yet accurate collaborative filtering algorithm.

　　意思是：一种简单而准确的协同过滤算法

　　10. co_clustering.CoClustering

　　说明：A collaborative filtering algorithm based on co-clustering.

　　意思是：一种基于共聚类的系统过滤算法

其中基于近邻的方法（协同过滤）可以设定不同的度量准则

相似度度量标准

　　1. cosine

　　说明：Compute the cosine similarity between all pairs of users (or items).

　　意思是：计算所有用户对（或物品）之间的相似度

　　2. msd

　　说明：Compute the Mean Squared Difference similarity between all pairs of users (or items).

　　意思是：计算所有用户对（或物品）之间的平均平方差相似度

　　3. pearson

　　说明：Compute the Pearson correlation coefficient between all pairs of users (or items).

　　意思是：计算所有用户对（或物品）之间的皮尔逊相关系数

　　4. pearson_baseline

　　说明：Compute the (shrunk) Pearson correlation coefficient between all pairs of users (or items) using baselines for centering instead of means.

　　意思是：计算所有用户对（或物品）之间的皮尔逊相关系数（收缩），使用基线进行居中，而不是使用平均值

支持不同的评估准则

评估准则

　　1. rmse 最小均方根误差

　　2. mae 平均绝对误差

　　3. fcp 协调对的分数

参考文章：https://blog.csdn.net/mycafe_/article/details/79146764

    原文作者：墨麟非攻
    原文地址: https://www.cnblogs.com/gezhuangzhuang/p/10206359.html
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。