11个重要的模型评估方法
11个重要的模型评估方法
qiuliangliang 发表于11个月前
11个重要的模型评估方法
  • 发表于 11个月前
  • 阅读 3
  • 收藏 0
  • 点赞 0
  • 评论 0

【腾讯云】买域名送云解析+SSL证书+建站!>>>   

Confidence Interval

http://www.datasciencecentral.com/profiles/blogs/black-box-confidence-intervals-excel-and-perl-implementations-det

Confidence intervals are used to assess how reliable a statistical estimate is. Wide confidence intervals mean that your model is poor (and it is worth investigating other models), or that your data is very noisy if confidence intervals don't improve by changing the model (that is, testing a different theoretical statistical distribution for your observations.) Modern confidence intervals are model-free, data -driven: click here to see how to compute them. A more general framework to assess and reduce sources of variance is called analysis of variance. Modern definitions of variance have a number of desirable properties

置信区间是指由样本统计量所构造的总体参数的估计区间。在统计学中,一个概率样本的置信区间(Confidence interval)是对这个样本的某个总体参数的区间估计。置信区间展现的是这个参数的真实值有一定概率落在测量结果的周围的程度。置信区间给出的是被测量参数的测量值的可信程度,即前面所要求的“一个概率”。

Pr(c1<=μ<=c2)=1-α

α是显著性水平(例:0.05或0.10)

100%*(1-α)指置信水平(例:95%或90%)

表达方式:interval(c1,c2)——置信区间。

Confusion Matrix. Used in the context of clustering. These N x N matrices (where N is the number of clusters) are designed as followed: the element in cell (i, j) represents the number of observations, in the test training set (as opposed to the control training set, in a cross-validation setting) that belong to cluster i and are assigned (by the clustering algorithm) to cluster j. When these numbers are transformed into proportions, these matrices are sometimes called contingency tables. A wrongly assigned observation is called false positive (non-fraudulent transaction erroneously labelled as fraudulent) or false negative (fraudulent transaction erroneously labelled as non- fraudulent). The higher the concentration of observations in the diagonal of the confusion matrix, the higher the accuracy / predictive power of your clustering algorithm.

Gain and Lift Chart. Lift is a measure of the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model. Cumulative gains and lift charts are visual aids for measuring model performance. Both charts consist of a lift curve and a baseline. Click here for details

Kolmogorov-Smirnov Chart. This non-parametric statistical test is used to compare two distributions, to assess how close they are to each other. In this context, one of the distributions is the theoretical distribution that the observations are supposed to follow (usually a continuous distribution with one or two parameters, such as Gaussian law), while the other distribution is the actual, empirical, parameter-free, discrete distribution computed on the observations.

Chi Square. It is another statistical test similar to Kolmogorov-Smirnov, but in this case it is a parametric test. It requires you to aggreate observations in a number of buckets or bins, each with at least 10 observations. 

ROC curve. Unlike the lift chart, the ROC curve is almost independent of the response rate. The receiver operating characteristic (ROC), or ROC curve, is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. The curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity or the sensitivity index d', known as "d-prime" in signal detection and biomedical informatics, or recall in machine learning. The false-positive rate is also known as the fall-out and can be calculated as (1 - specificity). The ROC curve is thus the sensitivity as a function of fall-out. Click here for details

Gini Coefficient.  The Gini coefficient is sometimes used in classification problems.  Gini = 2*AUC  - 1, where AUC is the area under the curve (see the ROC curve entry above). A Gini ratio above 60% corresponds to a good model. Not to be confused with the Gini index or Gini impurity, used when building decision trees.

Root Mean Square Error. RMSE is the must used and abused metric to compute goodness of fit. It is defined as the square root of the absolute value of the correlation coefficient between true values and predicted values, and widely used by Excel users.

L^1 version of RSME. The RSME metric (see above entry) is an L^2 metric, sensitive to outliers. Modern metrics are L^1 and sometimes based on rank statistics rather than raw data. One of these new metrics, developed by our data scientist, is described here.  

Cross Validation. This is a general framework to assess how a model will perform in the future; it is also used for model selection. It consists of splitting your training set into test and control data sets, training your algorithm (classifier, or predictive algorithm) on the control data set, and testing it on the test data set. Since the true values are known on the test data set, you can compare them with your predicted values, using one of the other comparison tools mentioned in this article. Usually the test data set itself is split into multiple subsets or data bins, to compute confidence intervals for predicted values. The test data set must be carefully selected, and must include different time frames and different types of observations (compared with the control data set), each with enough data points, in order to get sound, reliable conclusions as how the model will perform on future data, or on data that has slightly involved. Another idea is to introduce noise in the test data set and see how it impacts prediction: this is referred to as model sensitivity analysis. 

Predictive Power. This metric was developed internally at Data Science Central by our data scientist. It is related to the concept of entropy or the Gini index mentioned above in this article. It was designed as a synthetic metric satisfying interesting properties, and used to select a good subset of features in any machine learning project, or as a criterion to decide which node to split at each iteration, when building decision trees. Click here for details.

 

  • 打赏
  • 点赞
  • 收藏
  • 分享
共有 人打赏支持
粉丝 0
博文 2
码字总数 0
×
qiuliangliang
如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!
* 金额(元)
¥1 ¥5 ¥10 ¥20 其他金额
打赏人
留言
* 支付类型
微信扫码支付
打赏金额:
已支付成功
打赏金额: