分类模型——Logistics Regression

2023年8月29日 177次阅读来源: 叶青婧

Logistics regression

from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
clf.fit(x_train, y_train)

准确率与召回率

准确率：scikit-learn提供了accuracy_score来计算：LogisticRegression.score()
准确率是分类器预测正确性的比例，但是并不能分辨出假阳性错误和假阴性错误
精确率是指分类器预测出的垃圾短信中真的是垃圾短信的比例，P=TP/(TP+FP)
召回率在医学上也叫做灵敏度，在本例中知所有真的垃圾短信被分类器正确找出来的比例，R=TP/(TP+FN)

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score  
predictions = clf.predict(x_test)  
print('准确率：', accuracy_score(y_test, predictions))  
print('精确率：', precision_score(y_test, predictions))  
print('召回率：', recall_score(y_test, predictions))  
print('F1-Score：', f1_score(y_test, predictions))  

from sklearn.metrics import classification_report, accuracy_score, confusion_matrix  
predictions = clf.predict(x_test)  
print('准确率：', accuracy_score(y_test, predictions))  
print('混淆矩阵：', confusion_matrix(y_test, predictions))  
print('分类报告：', classification_report(y_test, predictions))

ROC AUC

ROC曲线（Receiver Operating Characteristic，ROC curve）可以用来可视化分类器的效果。和准确率不同，ROC曲线对分类比例不平衡的数据集不敏感，ROC曲线显示的是对超过限定阈值的所有预测结果的分类器效果。ROC曲线画的是分类器的召回率与误警率（fall-out）的曲线。误警率也称假阳性率，是所有阴性样本中分类器识别为阳性的样本所占比例：
F=FP/(TN+FP) AUC是ROC曲线下方的面积，它把ROC曲线变成一个值，表示分类器随机预测的效果.

from sklearn.metrics import roc_curve, auc  
predictions = clf.predict_proba(x_test)  
false_positive_rate, recall, thresholds = roc_curve(y_test, predictions[:, 1])  
roc_auc = auc(false_positive_rate, recall)  
plt.title('Receiver Operating Characteristic')  
plt.plot(false_positive_rate, recall, 'b', label='AUC = %0.2f' % roc_auc)  
plt.legend(loc='lower right')  
plt.plot([0, 1], [0, 1], 'r--')  
plt.xlim([0.0, 1.0])  
plt.ylim([0.0, 1.0])  
plt.ylabel('Recall')  
plt.xlabel('Fall-out')  
plt.show()

模型原理

http://blog.csdn.net/sergeyca…
http://blog.csdn.net/zjuPeco/…

    原文作者：叶青婧
    原文地址: https://segmentfault.com/a/1190000013578856
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。