文档章节

11个重要的模型评估方法

q
 qiuliangliang
发布于 2017/05/30 18:51
字数 1096
阅读 4
收藏 0
点赞 0
评论 0

Confidence Interval

http://www.datasciencecentral.com/profiles/blogs/black-box-confidence-intervals-excel-and-perl-implementations-det

Confidence intervals are used to assess how reliable a statistical estimate is. Wide confidence intervals mean that your model is poor (and it is worth investigating other models), or that your data is very noisy if confidence intervals don't improve by changing the model (that is, testing a different theoretical statistical distribution for your observations.) Modern confidence intervals are model-free, data -driven: click here to see how to compute them. A more general framework to assess and reduce sources of variance is called analysis of variance. Modern definitions of variance have a number of desirable properties

置信区间是指由样本统计量所构造的总体参数的估计区间。在统计学中,一个概率样本的置信区间(Confidence interval)是对这个样本的某个总体参数的区间估计。置信区间展现的是这个参数的真实值有一定概率落在测量结果的周围的程度。置信区间给出的是被测量参数的测量值的可信程度,即前面所要求的“一个概率”。

Pr(c1<=μ<=c2)=1-α

α是显著性水平(例:0.05或0.10)

100%*(1-α)指置信水平(例:95%或90%)

表达方式:interval(c1,c2)——置信区间。

Confusion Matrix. Used in the context of clustering. These N x N matrices (where N is the number of clusters) are designed as followed: the element in cell (i, j) represents the number of observations, in the test training set (as opposed to the control training set, in a cross-validation setting) that belong to cluster i and are assigned (by the clustering algorithm) to cluster j. When these numbers are transformed into proportions, these matrices are sometimes called contingency tables. A wrongly assigned observation is called false positive (non-fraudulent transaction erroneously labelled as fraudulent) or false negative (fraudulent transaction erroneously labelled as non- fraudulent). The higher the concentration of observations in the diagonal of the confusion matrix, the higher the accuracy / predictive power of your clustering algorithm.

Gain and Lift Chart. Lift is a measure of the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model. Cumulative gains and lift charts are visual aids for measuring model performance. Both charts consist of a lift curve and a baseline. Click here for details

Kolmogorov-Smirnov Chart. This non-parametric statistical test is used to compare two distributions, to assess how close they are to each other. In this context, one of the distributions is the theoretical distribution that the observations are supposed to follow (usually a continuous distribution with one or two parameters, such as Gaussian law), while the other distribution is the actual, empirical, parameter-free, discrete distribution computed on the observations.

Chi Square. It is another statistical test similar to Kolmogorov-Smirnov, but in this case it is a parametric test. It requires you to aggreate observations in a number of buckets or bins, each with at least 10 observations. 

ROC curve. Unlike the lift chart, the ROC curve is almost independent of the response rate. The receiver operating characteristic (ROC), or ROC curve, is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity or the sensitivity index d', known as "d-prime" in signal detection and biomedical informatics, or recall in machine learning. The false-positive rate is also known as the fall-out and can be calculated as (1 - specificity). The ROC curve is thus the sensitivity as a function of fall-out. Click here for details

Gini Coefficient.  The Gini coefficient is sometimes used in classification problems.  Gini = 2*AUC  - 1, where AUC is the area under the curve (see the ROC curve entry above). A Gini ratio above 60% corresponds to a good model. Not to be confused with the Gini index or Gini impurity, used when building decision trees.

Root Mean Square Error. RMSE is the must used and abused metric to compute goodness of fit. It is defined as the square root of the absolute value of the correlation coefficient between true values and predicted values, and widely used by Excel users.

L^1 version of RSME. The RSME metric (see above entry) is an L^2 metric, sensitive to outliers. Modern metrics are L^1 and sometimes based on rank statistics rather than raw data. One of these new metrics, developed by our data scientist, is described here.  

Cross Validation. This is a general framework to assess how a model will perform in the future; it is also used for model selection. It consists of splitting your training set into test and control data sets, training your algorithm (classifier, or predictive algorithm) on the control data set, and testing it on the test data set. Since the true values are known on the test data set, you can compare them with your predicted values, using one of the other comparison tools mentioned in this article. Usually the test data set itself is split into multiple subsets or data bins, to compute confidence intervals for predicted values. The test data set must be carefully selected, and must include different time frames and different types of observations (compared with the control data set), each with enough data points, in order to get sound, reliable conclusions as how the model will perform on future data, or on data that has slightly involved. Another idea is to introduce noise in the test data set and see how it impacts prediction: this is referred to as model sensitivity analysis. 

Predictive Power. This metric was developed internally at Data Science Central by our data scientist. It is related to the concept of entropy or the Gini index mentioned above in this article. It was designed as a synthetic metric satisfying interesting properties, and used to select a good subset of features in any machine learning project, or as a criterion to decide which node to split at each iteration, when building decision trees. Click here for details.

 

本文转载自:http://www.datasciencecentral.com/profiles/blogs/7-important-model-evaluation-error-metrics-everyone

共有 人打赏支持
q
粉丝 0
博文 2
码字总数 0
作品 0
朝阳
GIS几个重要的研究方向

1 空间数据库的准确性研究 地理信息数据中误差处理和不确定性错误处理的方法和技术 ,包括 : 不确定性误差模型 ; 误差跟踪并对误差进行编码的方法 ; 计算和表达在 GIS应用中的误差 ; 数据精度...

晨曦之光
2012/04/12
170
1
sklearn调包侠之朴素贝叶斯

文档处理 朴素贝叶斯算法常用于文档的分类问题上,但计算机是不能直接理解文档内容的,怎么把文档内容转换为计算机可以计算的数字,这是自然语言处理(NLP)中很重要的内容。 TF-IDF方法 今天...

罗罗攀
07/03
0
0
机器学习-2:MachineLN之模型评估

开篇废话: 很多文章其实都是将书中的东西、网上课程、或者别人的论文的东西总结一下,发出来,但是个人感觉还是加入个人的理解,然后加上一些工程中遇到的问题一起发出来会更好些。 正如学习...

MachineLP
01/10
0
0
基于机器学习的工控安全风险评估

1 引言 随着工业控制网络与企业信息网络的不断融合,工业控制系统的安全管理受到了重大的挑战。工控系统安全等级评估是安全管理的重要内容,传统的安全等级评估方法主要有故障树分析法、层次...

liqing1310200526
03/14
0
0
机器学习中多分类模型的评估方法之--kappa系数

引言 分类是机器学习中监督学习的一种重要应用,基于统计的机器学习方法可以使用SVM进行二分类,可以使用决策书,梯度提升树等进行多分类。 对于二分类模型,我们通常可以使用ROC曲线来评估模...

wang7807564
05/09
0
0
超详细!上线一个机器学习项目你需要哪些准备?

价值主张 机器学习应该以满足用户需求为目的进行设计 谁是预测系统的最终用户? 我们需要他们做些什么? 服务的目标是什么?目标的意义又是什么? 只有在回答这3W问题之后,你才能开始思考一...

【方向】
04/25
0
0
CVPR 2018 | 牛津大学&Emotech首次严谨评估语义分割模型对对抗攻击的鲁棒性

  选自arXiv   作者:Anurag Arnab等   机器之心编译   参与:张倩、路雪      牛津大学&Emotech 实验室合作的一篇论文首次严谨评估了义分割模型对对抗攻击的鲁棒性。该研究分析...

机器之心
06/03
0
0
学界 | 综述论文:机器学习中的模型评价、模型选择与算法选择

本论文回顾了用于解决模型评估、模型选择和算法选择三项任务的不同技术,并参考理论和实证研究讨论了每一项技术的主要优势和劣势。进而,给出建议以促进机器学习研究与应用方面的最佳实践。 ...

机器之心
02/02
0
0
最完整的检测模型评估指标mAP计算指南(附代码)在这里!

前言 对于使用机器学习解决的大多数常见问题,通常有多种可用的模型。每个模型都有自己的独特之处,并随因素变化而表现不同每个模型在“验证/测试”数据集上来评估性能,性能衡量使用各种统计...

机器学习算法工程师
06/19
0
0
2016年3月14日作业

第四章项目管理一般知识 1、核心知识域有哪些、保障域有哪些?伴随域有哪些?过程域有哪些? 核心知识域包含整体管理、范围管理、进度管理、成本管理、质量管理和信息安全管理等。 保障域包含...

afanny
2016/03/15
0
0

没有更多内容

加载失败,请刷新页面

加载更多

下一页

32.filter表案例 nat表应用 (iptables)

10.15 iptables filter表案例 10.16/10.17/10.18 iptables nat表应用 10.15 iptables filter表案例: ~1. 写一个具体的iptables小案例,需求是把80端口、22端口、21 端口放行。但是,22端口我...

王鑫linux
今天
0
0
shell中的函数&shell中的数组&告警系统需求分析

20.16/20.17 shell中的函数 20.18 shell中的数组 20.19 告警系统需求分析

影夜Linux
今天
0
0
Linux网络基础、Linux防火墙

Linux网络基础 ip addr 命令 :查看网口信息 ifconfig命令:查看网口信息,要比ip addr更明了一些 centos 7默认没安装ifconfig命令,可以使用yum install -y net-tools命令来安装。 ifconfig...

李超小牛子
今天
1
0
[机器学习]回归--Decision Tree Regression

CART决策树又称分类回归树,当数据集的因变量为连续性数值时,该树算法就是一个回归树,可以用叶节点观察的均值作为预测值;当数据集的因变量为离散型数值时,该树算法就是一个分类树,可以很...

wangxuwei
昨天
1
0
Redis做分布式无锁CAS的问题

因为Redis本身是单线程的,具备原子性,所以可以用来做分布式无锁的操作,但会有一点小问题。 public interface OrderService { public String getOrderNo();} public class OrderRe...

算法之名
昨天
9
0
143. Reorder List - LeetCode

Question 143. Reorder List Solution 题目大意:给一个链表,将这个列表分成前后两部分,后半部分反转,再将这两分链表的节点交替连接成一个新的链表 思路 :先将链表分成前后两部分,将后部...

yysue
昨天
1
0
数据结构与算法1

第一个代码,描述一个被称为BankAccount的类,该类模拟了银行中的账户操作。程序建立了一个开户金额,显示金额,存款,取款并显示余额。 主要的知识点联系为类的含义,构造函数,公有和私有。...

沉迷于编程的小菜菜
昨天
1
0
从为什么别的队伍总比你的快说起

在机场候检排队的时候,大多数情况下,别的队伍都要比自己所在的队伍快,并常常懊悔当初怎么没去那个队。 其实,最快的队伍只能有一个,而排队之前并不知道那个队快。所以,如果有六个队伍你...

我是菜鸟我骄傲
昨天
1
0
分布式事务常见的解决方案

随着互联网的发展,越来越多的多服务相互之间的调用,这时候就产生了一个问题,在单项目情况下很容易实现的事务控制(通过数据库的acid控制),变得不那么容易。 这时候就产生了多种方案: ...

小海bug
昨天
3
0
python从零学——scrapy初体验

python从零学——scrapy初体验 近日因为一些事情,需要从网上爬取一些东西,故而想通过使用爬虫来顺便学习下强大的python。现将一些学习中遇到的问题记录下来,以便日后查询 1. 开发环境的准...

咾咔叽
昨天
1
0

没有更多内容

加载失败,请刷新页面

加载更多

下一页

返回顶部
顶部