协同过滤算法实现步骤
- 1.表示用户行为矩阵,即统计用户购买某种商品类型的数量
public double[] getNumByCustomer(Customer customer){
List<OrderItem> list =orderItemDao.findByCustomerAndAliveAndState(customer.getId(),1,2);
double [] vectore =new double[totalNum];
int index=0;
for(ProductType type:productTypes){
for(OrderItem orderItem:list){
if(orderItem.getProduct().getProductType().id==type.id){
vectore[index]=vectore[index]+orderItem.getNum();
}
}
return vectore;
}
- 2.用余弦距离计算每个用户与其它用户的行为相似度
下面代码是两个用户之间的相似度,进行遍历就可以获取全部相似度
public double countSimilarity(double [] a,double [] b){
double total=0;
double alength=0;
double blength=0;
for(int i=0;i<a.length;i++){
total=total+a[i]*b[i];
alength=alength+a[i]*a[i];
blength=blength+b[i]*b[i];
}
double down=Math.sqrt(alength)*Math.sqrt(blength);
double result=0;
if(down!=0){
result =total/down;
}
return result;
}
- 3.取相似度最高的前n个用户,组成相似用户集合
对Map按值进行排序
public List<Map.Entry<Long,Double>> getMaxSimilarity(Customer customer){
Map<Long,Double> result =new HashMap<Long,Double>();
double vector[] =(double [])users.get(customer.getId());
for(Map.Entry<Long,Object> entry:users.entrySet()){
if(entry.getKey()!=customer.getId()){
double [] temp =(double[])entry.getValue();
double similarity =countSimilarity(temp,vector); result.put(entry.getKey(),similarity);
}
}
List<Map.Entry<Long,Double>> list = new LinkedList<Map.Entry<Long,Double>>( result.entrySet() );
Collections.sort( list, new Comparator<Map.Entry<Long,Double>>(){
public int compare( Map.Entry<Long,Double> o1, Map.Entry<Long,Double> o2 )
{
return (o2.getValue()).compareTo( o1.getValue() );
}
} );
return list;
}
- 4.获得相似用户集合购买的商品,并统计相似用户购买的商品的数量,进行排序
public Map<Long,ProductNumModel> getProducts(List<Map.Entry<Long,Double>> list){
List<Customer> simCustomers =new ArrayList<Customer>();
System.out.println("相似度高的3个用户 ");
for(int i=0;i<list.size()&&i<3;i++){
Long id =list.get(i).getKey();
Customer customer =customerDao.findByIdAndAlive(id,1);
simCustomers.add(customer);
}
Map<Long,ProductNumModel> map =new HashMap<Long,ProductNumModel>();
for(Customer customer:simCustomers){
Map<Long,ProductNumModel> hashSet =getCustomerProduct(customer);
for(Map.Entry<Long,ProductNumModel> entry:hashSet.entrySet()){
ProductNumModel model=null;
if(map.containsKey(entry.getKey())){
model=map.get(entry.getKey());
model.num+=entry.getValue().num;
}else{
model=new ProductNumModel();
model.product=entry.getValue().product;
model.num=entry.getValue().num;
}
map.put(entry.getKey(),model);
}
}
return map;
}
- 总的调用函数,将前面函数连接,并把结果存到文件中。如果文件不存在,则用算法计算,如果文件内容存在,则直接读取文件的内容。开定时任务,每天或者一周将商品推荐文件删除,则会自动更新商品推荐内容
public Map<String,Object> getAllSimilarity(Customer customer) throws IOException {
changeCustomerToVector();
for(Map.Entry<Long,Object> entry:users.entrySet()){
double [] temp=(double [])entry.getValue();
}
InputStream inputStream = this.getClass().getClassLoader().getResourceAsStream("cxtx.properties");
Properties p = new Properties();
try {
p.load(inputStream);
} catch (IOException e1) {
e1.printStackTrace();
}
String folderPath = p.getProperty("recommendFile");
File file=new File(folderPath);
if(!file.exists()){
file.createNewFile();
}
FileInputStream fileInputStream=new FileInputStream(file);
Map<String,Object> map =new HashMap<String,Object>();
com.alibaba.fastjson.JSONObject jsonObject = null;
try {
if(fileInputStream!=null){
jsonObject = com.alibaba.fastjson.JSON.parseObject(IOUtils.toString(fileInputStream, "UTF-8"));
}
} catch (IOException e) {
map.put("msg","JSON 格式不正确");
map.put("content","");
return map;
}
Object content=null;
if(jsonObject==null){ //如果文件中没有,则计算每个用户的推荐产品
FileWriter fileWriter=new FileWriter(file,true);
BufferedWriter bufferedWriter=new BufferedWriter(fileWriter);
Map<Long,Object> temp =new HashMap<Long,Object>();
for(Customer c:customers){
List<Map.Entry<Long,Double>> list =this.getMaxSimilarity(c);
Map<Long,ProductNumModel> result =getProducts(list);
List<Product> list1=sortProduct(result);
temp.put(c.getId(),list1);
}
JSONObject object=new JSONObject(temp);
bufferedWriter.write(object.toString());
bufferedWriter.flush();
if(object!=null){
content= object.get(customer.getId()+"");
}
}else{
if(null!=jsonObject.get(customer.getId()+"")){
content=jsonObject.get(customer.getId()+"");
}
}
map.put("msg","获取成功");
map.put("content",content);
return map;
}
1.用户相似度计算时,要考虑分母为0的情况;同时要防止数值太大,超过了double能表示的范围,可以做一些处理,例如除以最大的某个商品销售量,来表示某个维度的向量值,或者减去某个值等等
2.余弦值越接近1,表明两个向量越相似,即计算出来的值越大,用户行为越相似
3.最后获得推荐的商品数量可以较多或较少,要根据一定策略进行排序,例如相似用户的购买数量,而不是商品总的销售量,因为不相似用户的数据,容易产生干扰。