文档章节

11个重要的模型评估方法

q
 qiuliangliang
发布于 2017/05/30 18:51
字数 1096
阅读 7
收藏 0

Confidence Interval

http://www.datasciencecentral.com/profiles/blogs/black-box-confidence-intervals-excel-and-perl-implementations-det

Confidence intervals are used to assess how reliable a statistical estimate is. Wide confidence intervals mean that your model is poor (and it is worth investigating other models), or that your data is very noisy if confidence intervals don't improve by changing the model (that is, testing a different theoretical statistical distribution for your observations.) Modern confidence intervals are model-free, data -driven: click here to see how to compute them. A more general framework to assess and reduce sources of variance is called analysis of variance. Modern definitions of variance have a number of desirable properties

置信区间是指由样本统计量所构造的总体参数的估计区间。在统计学中,一个概率样本的置信区间(Confidence interval)是对这个样本的某个总体参数的区间估计。置信区间展现的是这个参数的真实值有一定概率落在测量结果的周围的程度。置信区间给出的是被测量参数的测量值的可信程度,即前面所要求的“一个概率”。

Pr(c1<=μ<=c2)=1-α

α是显著性水平(例:0.05或0.10)

100%*(1-α)指置信水平(例:95%或90%)

表达方式:interval(c1,c2)——置信区间。

Confusion Matrix. Used in the context of clustering. These N x N matrices (where N is the number of clusters) are designed as followed: the element in cell (i, j) represents the number of observations, in the test training set (as opposed to the control training set, in a cross-validation setting) that belong to cluster i and are assigned (by the clustering algorithm) to cluster j. When these numbers are transformed into proportions, these matrices are sometimes called contingency tables. A wrongly assigned observation is called false positive (non-fraudulent transaction erroneously labelled as fraudulent) or false negative (fraudulent transaction erroneously labelled as non- fraudulent). The higher the concentration of observations in the diagonal of the confusion matrix, the higher the accuracy / predictive power of your clustering algorithm.

Gain and Lift Chart. Lift is a measure of the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model. Cumulative gains and lift charts are visual aids for measuring model performance. Both charts consist of a lift curve and a baseline. Click here for details

Kolmogorov-Smirnov Chart. This non-parametric statistical test is used to compare two distributions, to assess how close they are to each other. In this context, one of the distributions is the theoretical distribution that the observations are supposed to follow (usually a continuous distribution with one or two parameters, such as Gaussian law), while the other distribution is the actual, empirical, parameter-free, discrete distribution computed on the observations.

Chi Square. It is another statistical test similar to Kolmogorov-Smirnov, but in this case it is a parametric test. It requires you to aggreate observations in a number of buckets or bins, each with at least 10 observations. 

ROC curve. Unlike the lift chart, the ROC curve is almost independent of the response rate. The receiver operating characteristic (ROC), or ROC curve, is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity or the sensitivity index d', known as "d-prime" in signal detection and biomedical informatics, or recall in machine learning. The false-positive rate is also known as the fall-out and can be calculated as (1 - specificity). The ROC curve is thus the sensitivity as a function of fall-out. Click here for details

Gini Coefficient.  The Gini coefficient is sometimes used in classification problems.  Gini = 2*AUC  - 1, where AUC is the area under the curve (see the ROC curve entry above). A Gini ratio above 60% corresponds to a good model. Not to be confused with the Gini index or Gini impurity, used when building decision trees.

Root Mean Square Error. RMSE is the must used and abused metric to compute goodness of fit. It is defined as the square root of the absolute value of the correlation coefficient between true values and predicted values, and widely used by Excel users.

L^1 version of RSME. The RSME metric (see above entry) is an L^2 metric, sensitive to outliers. Modern metrics are L^1 and sometimes based on rank statistics rather than raw data. One of these new metrics, developed by our data scientist, is described here.  

Cross Validation. This is a general framework to assess how a model will perform in the future; it is also used for model selection. It consists of splitting your training set into test and control data sets, training your algorithm (classifier, or predictive algorithm) on the control data set, and testing it on the test data set. Since the true values are known on the test data set, you can compare them with your predicted values, using one of the other comparison tools mentioned in this article. Usually the test data set itself is split into multiple subsets or data bins, to compute confidence intervals for predicted values. The test data set must be carefully selected, and must include different time frames and different types of observations (compared with the control data set), each with enough data points, in order to get sound, reliable conclusions as how the model will perform on future data, or on data that has slightly involved. Another idea is to introduce noise in the test data set and see how it impacts prediction: this is referred to as model sensitivity analysis. 

Predictive Power. This metric was developed internally at Data Science Central by our data scientist. It is related to the concept of entropy or the Gini index mentioned above in this article. It was designed as a synthetic metric satisfying interesting properties, and used to select a good subset of features in any machine learning project, or as a criterion to decide which node to split at each iteration, when building decision trees. Click here for details.

 

本文转载自:http://www.datasciencecentral.com/profiles/blogs/7-important-model-evaluation-error-metrics-everyone

共有 人打赏支持
q
粉丝 0
博文 2
码字总数 0
作品 0
朝阳
入门 | 用机器学习进行欺诈预测的模型设计

Airbnb网站基于允许任何人将闲置的房屋进行长期或短期出租构建商业模式,来自房客或房东的欺诈风险是必须解决的问题。Airbnb信任和安全小组通过构建机器学习模型进行欺诈预测,本文介绍了其设...

技术小能手
09/11
0
0
GIS几个重要的研究方向

1 空间数据库的准确性研究 地理信息数据中误差处理和不确定性错误处理的方法和技术 ,包括 : 不确定性误差模型 ; 误差跟踪并对误差进行编码的方法 ; 计算和表达在 GIS应用中的误差 ; 数据精度...

晨曦之光
2012/04/12
170
1
基于机器学习的工控安全风险评估

1 引言 随着工业控制网络与企业信息网络的不断融合,工业控制系统的安全管理受到了重大的挑战。工控系统安全等级评估是安全管理的重要内容,传统的安全等级评估方法主要有故障树分析法、层次...

liqing1310200526
03/14
0
0
sklearn调包侠之朴素贝叶斯

文档处理 朴素贝叶斯算法常用于文档的分类问题上,但计算机是不能直接理解文档内容的,怎么把文档内容转换为计算机可以计算的数字,这是自然语言处理(NLP)中很重要的内容。 TF-IDF方法 今天...

罗罗攀
07/03
0
0
机器学习-2:MachineLN之模型评估

开篇废话: 很多文章其实都是将书中的东西、网上课程、或者别人的论文的东西总结一下,发出来,但是个人感觉还是加入个人的理解,然后加上一些工程中遇到的问题一起发出来会更好些。 正如学习...

MachineLP
01/10
0
0

没有更多内容

加载失败,请刷新页面

加载更多

70.shell的函数 数组 告警系统需求分析

20.16/20.17 shell中的函数 20.18 shell中的数组 20.19 告警系统需求分析 20.16/20.17 shell中的函数: ~1. 函数就是把一段代码整理到了一个小单元中,并给这个小单元起一个名字,当用到这段...

王鑫linux
今天
0
0
分布式框架spring-session实现session一致性使用问题

前言:项目中使用到spring-session来缓存用户信息,保证服务之间session一致性,但是获取session信息为什么不能再服务层获取? 一、spring-session实现session一致性方式 用户每一次请求都会...

WALK_MAN
今天
5
0
C++ yield()与sleep_for()

C++11 标准库提供了yield()和sleep_for()两个方法。 (1)std::this_thread::yield(): 线程调用该方法时,主动让出CPU,并且不参与CPU的本次调度,从而让其他线程有机会运行。在后续的调度周...

yepanl
今天
4
0
Java并发编程实战(chapter_3)(线程池ThreadPoolExecutor源码分析)

这个系列一直没再写,很多原因,中间经历了换工作,熟悉项目,熟悉新团队等等一系列的事情。并发课题对于Java来说是一个又重要又难的一大块,除非气定神闲、精力满满,否则我本身是不敢随便写...

心中的理想乡
今天
31
0
shell学习之获取用户的输入命令read

在运行脚本的时候,命令行参数是可以传入参数,还有就是在脚本运行过程中需要用户输入参数,比如你想要在脚本运行时问个问题,并等待运行脚本的人来回答。bash shell为此提 供了read命令。 ...

woshixin
今天
4
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部