文档章节

OpenCV3的机器学习算法SVM-使用Python

openthings
 openthings
发布于 2016/03/01 15:18
字数 2831
阅读 947
收藏 1

OpenCV3的机器学习算法SVM-使用Python

http://docs.opencv.org/master/d4/db1/tutorial_py_svm_basics.html


Goal

In this chapter

  • We will see an intuitive understanding of SVM

目标

SVM 有一个直观理解

Theory

Linearly Separable Data

Consider the image below which has two types of data, red and blue. In kNN, for a test data, we used to measure its distance to all the training samples and take the one with minimum distance. It takes plenty of time to measure all the distances and plenty of memory to store all the training-samples. But considering the data given in image, should we need that much?

原理
线性数据分割

如下图所示,其中含有两类数据,红的和蓝的。如果是使用 kNN,对于一 个测试数据我们要测量它到每一个样本的距离,从而根据最近邻居分类。测量 所有的距离需要足够的时间,并且需要大量的内存存储训练样本。但是分类下 图所示的数据真的需要占用这么多资源吗?

svm_basics1.png

image

Consider another idea. We find a line, f(x)=ax1+bx2+c which divides both the data to two regions. When we get a new test_data X, just substitute it in f(x). If f(X)>0, it belongs to blue group, else it belongs to red group. We can call this line as Decision Boundary. It is very simple and memory-efficient. Such data which can be divided into two with a straight line (or hyperplanes in higher dimensions) is called Linear Separable.

我们在考虑另外一个想法。我们找到了一条直线,f (x) = ax1 + bx2 + c, 它可以将所有的数据分割到两个区域。当我们拿到一个测试数据 X 时,我们只 需要把它代入 f (x)。如果 |f (X)| > 0,它就属于蓝色组,否则就属于红色组。 我们把这条线称为决定边界(Decision_Boundary)。很简单而且内存使用 效率也很高。这种使用一条直线(或者是高位空间种的超平面)上述数据分成 两组的方法成为线性分割。

So in above image, you can see plenty of such lines are possible. Which one we will take? Very intuitively we can say that the line should be passing as far as possible from all the points. Why? Because there can be noise in the incoming data. This data should not affect the classification accuracy. So taking a farthest line will provide more immunity against noise. So what SVM does is to find a straight line (or hyperplane) with largest minimum distance to the training samples. See the bold line in below image passing through the center.

从上图中我们看到有很多条直线可以将数据分为蓝红两组,那一条直线是 最好的呢?直觉上讲这条直线应该是与两组数据的距离越远越好。为什么呢? 因为测试数据可能有噪音影响(真实数据 + 噪声)。这些数据不应该影响分类 的准确性。所以这条距离远的直线抗噪声能力也就最强。所以 SVM 要做就是 找到一条直线,并使这条直线到(训练样本)各组数据的最短距离最大。下图 中加粗的直线经过中心。

svm_basics2.png

image

So to find this Decision Boundary, you need training data. Do you need all? NO. Just the ones which are close to the opposite group are sufficient. In our image, they are the one blue filled circle and two red filled squares. We can call them Support Vectors and the lines passing through them are called Support Planes. They are adequate for finding our decision boundary. We need not worry about all the data. It helps in data reduction.

要找到决定边界,就需要使用训练数据。我们需要所有的训练数据吗?不, 只需要那些靠近边界的数据,如上图中一个蓝色的圆盘和两个红色的方块。我 们叫他们支持向量,经过他们的直线叫做支持平面。有了这些数据就足以找到 决定边界了。我们担心所有的数据。这对于数据简化有帮助。

What happened is, first two hyperplanes are found which best represents the data. For eg, blue data is represented by wTx+b0>1 while red data is represented by wTx+b0<−1 where w is weight vector ( w=[w1,w2,...,wn]) and x is the feature vector ( x=[x1,x2,...,xn]). b0 is the bias. Weight vector decides the orientation of decision boundary while bias point decides its location. Now decision boundary is defined to be midway between these hyperplanes, so expressed as wTx+b0=0. The minimum distance from support vector to the decision boundary is given by, distancesupportvectors=1||w||. Margin is twice this distance, and we need to maximize this margin. i.e. we need to minimize a new function L(w,b0) with some constraints which can expressed below:

minw,b0L(w,b0)=12||w||2subject toti(wTx+b0)≥1i

where ti is the label of each class, ti∈[−1,1]

.

到底发生了什么呢?首先我们找到了分别代表两组数据的超平面。例如,蓝 色数据可以用 ωT x+b0 > 1 表示,而红色数据可以用 ωT x+b0 < 1 表示,ω 叫 做权重向量(ω = [ω12,...,ω3]),x 为特征向量(x = [x1,x2,...,xn])。b0

成为 bias(截距?)。权重向量决定了决定边界的走向,而 bias 点决定了它(决

定边界)的位置。决定边界被定义为这两个超平面的中间线(平面),表达式为

ωT x+b0 = 0。从支持向量到决定边界的最短距离为 distancesupport vector = 1 ||ω||

边缘长度为这个距离的两倍,我们需要使这个边缘长度最大。我们要创建一个 新的函数 L (ω, b0) 并使它的值最小:

其中 ti 是每一组的标记,ti [1, 1]

Non-Linearly Separable Data

Consider some data which can't be divided into two with a straight line. For example, consider an one-dimensional data where 'X' is at -3 & +3 and 'O' is at -1 & +1. Clearly it is not linearly separable. But there are methods to solve these kinds of problems. If we can map this data set with a function, f(x)=x2, we get 'X' at 9 and 'O' at 1 which are linear separable.

非线性数据分割

想象一下,如果一组数据不能被一条直线分为两组怎么办?例如,在一维 空间中 X 类包含的数据点有(-3,3),O 类包含的数据点有(-1,1)。很明显 不可能使用线性分割将 X O 分开。但是有一个方法可以帮我们解决这个问 题。使用函数 f(x) = x2 对这组数据进行映射,得到的 X 9,O 1,这时 就可以使用线性分割了。

Otherwise we can convert this one-dimensional to two-dimensional data. We can use f(x)=(x,x2) function to map this data. Then 'X' becomes (-3,9) and (3,9) while 'O' becomes (-1,1) and (1,1). This is also linear separable. In short, chance is more for a non-linear separable data in lower-dimensional space to become linear separable in higher-dimensional space.

或者我们也可以把一维数据转换成两维数据。我们可以使用函数 f (x) =(x, x2 ) 对数据进行映射。这样 X 就变成了(-3,9)和(3,9)而 O 就变成了 (-1,1)和(1,1)。同样可以线性分割,简单来说就是在低维空间不能线性分割的数据在高维空间很有可能可以线性分割。

In general, it is possible to map points in a d-dimensional space to some D-dimensional space (D>d) to check the possibility of linear separability. There is an idea which helps to compute the dot product in the high-dimensional (kernel) space by performing computations in the low-dimensional input (feature) space. We can illustrate with following example.

通常我们可以将 d 维数据映射到 D 维数据来检测是否可以线性分割(D>d)。这种想法可以帮助我们通过对低维输入(特征)空间的计算来获得高 维空间的点积。我们可以用下面的例子说明。


Consider two points in two-dimensional space, p=(p1,p2) and q=(q1,q2). Let ϕ be a mapping function which maps a two-dimensional point to three-dimensional space as follows:

ϕ(p)=(p21,p22,2p1p2)ϕ(q)=(q21,q22,2q1q2)

Let us define a kernel function K(p,q) which does a dot product between two points, shown below:

K(p,q)=ϕ(p).ϕ(q)ϕ(p).ϕ(q)=ϕ(p)Tϕ(q)=(p21,p22,2p1p2).(q21,q22,2q1q2)=p1q1+p2q2+2p1q1p2q2=(p1q1+p2q2)2=(p.q)2

假设我们有二维空间的两个点:p = (p1, p2) q = (q1, q2)。用 Ø 表示映 射函数,它可以按如下方式将二维的点映射到三维空间中:

我们要定义一个核函数 K (p, q),它可以用来计算两个点的内积,如下所示

It means, a dot product in three-dimensional space can be achieved using squared dot product in two-dimensional space. This can be applied to higher dimensional space. So we can calculate higher dimensional features from lower dimensions itself. Once we map them, we get a higher dimensional space.

这说明三维空间中的内积可以通过计算二维空间中内积的平方来获得。这 可以扩展到更高维的空间。所以根据低维的数据来计算它们的高维特征。在进 行完映射后,我们就得到了一个高维空间数据。

In addition to all these concepts, there comes the problem of misclassification. So just finding decision boundary with maximum margin is not sufficient. We need to consider the problem of misclassification errors also. Sometimes, it may be possible to find a decision boundary with less margin, but with reduced misclassification. Anyway we need to modify our model such that it should find decision boundary with maximum margin, but with less misclassification. The minimization criteria is modified as:

min||w||2+C(distanceofmisclassifiedsamplestotheircorrectregions)

除了上面的这些概念之外,还有一个问题需要解决,那就是分类错误。仅 仅找到具有最大边缘的决定边界是不够的。我们还需要考虑错误分类带来的误 差。有时我们找到的决定边界的边缘可能不是最大的但是错误分类是最少的。 所以我们需要对我们的模型进行修正来找到一个更好的决定边界:最大的边缘, 最小的错误分类。评判标准就被修改为:

Below image shows this concept. For each sample of the training data a new parameter ξi is defined. It is the distance from its corresponding training sample to their correct decision region. For those who are not misclassified, they fall on their corresponding support planes, so their distance is zero.

下图显示这个概念。对于训练数据的每一个样本又增加了一个参数 ξi。它 表示训练样本到他们所属类(实际所属类)的超平面的距离。对于那些分类正 确的样本这个参数为 0,因为它们会落在它们的支持平面上。

svm_basics3.png

image

So the new optimization problem is :

minw,b0L(w,b0)=||w||2+Ciξi subject to yi(wTxi+b0)≥1ξi and ξi≥0 ∀i

How should the parameter C be chosen? It is obvious that the answer to this question depends on how the training data is distributed. Although there is no general answer, it is useful to take into account these rules:

  • Large values of C give solutions with less misclassification errors but a smaller margin. Consider that in this case it is expensive to make misclassification errors. Since the aim of the optimization is to minimize the argument, few misclassifications errors are allowed.

  • Small values of C give solutions with bigger margin and more classification errors. In this case the minimization does not consider that much the term of the sum so it focuses more on finding a hyperplane with big margin.

现在新的最优化问题就变成了:

参数 C 的取值应该如何选择呢?很明显应该取决于你的训练数据。虽然没 有一个统一的答案,但是在选取 C 的取值时我们还是应该考虑一下下面的规 则:

  • 如果 C 的取值比较大,错误分类会减少,但是边缘也会减小。其实就是 错误分类的代价比较高,惩罚比较大。(在数据噪声很小时我们可以选取 较大的 C 值。)

  • 如果 C 的取值比较小,边缘会比较大,但错误分类的数量会升高。其实 就是错误分类的代价比较低,惩罚很小。整个优化过程就是为了找到一个 具有最大边缘的超平面对数据进行分类。(如果数据噪声比较大时,应该 考虑)

Additional Resources

  1. NPTEL notes on Statistical Pattern Recognition, Chapters 25-29.

Exercises


© 著作权归作者所有

共有 人打赏支持
openthings
粉丝 257
博文 925
码字总数 477436
作品 1
东城
架构师
七步教你Python进行机器学习

网络上有很多Python学习资源和机器学习学习资源,对于一个新手而言,如何开始呢?本篇文章将教你七步学会使用Python进行机器学习。 万事开头难。面对纷繁万千的网络学习资源,不知如何下手,...

断桥残雪断桥残雪
2015/11/20
0
0
Python之CVXOPT模块

  Python中支持Convex Optimization(凸规划)的模块为CVXOPT,其安装方式为: 卸载原Pyhon中的Numpy 安装CVXOPT的whl文件,链接为:https://www.lfd.uci.edu/~gohlke/pythonlibs/ 安装Num...

jclian91
02/13
0
0
OpenCV3中的机器学习算法

OpenCV3中加入了几种机器学习算法,可以将机器学习算法与图像和视频处理结合起来。可参考: OpenCV/OpenCV3计算机视觉软件支持库和最新资源 OpenCV3的最新特征 OpenCV3的人脸检测-使用Pytho...

openthings
2016/03/01
126
0
基于python的机器学习(1)-环境配置

基于python的机器学习(1)-环境配置 01.基本介绍 不能说当前机器学习很强大,但是可以说当前机器学习在现实的生活中所起的作用也越来越大了,将来,社会对这方面的人才需求也会越老越大。对...

meiqi0538
04/20
0
0
python 超全sklearn教程,数据挖掘从入门到入坑

最近工作中遇到了一些数据建模的问题,趁这几天有时间,把数据挖掘过程中一些流程规范和常见的机器学习问题总结一下。本篇博文涵盖的内容有机器学习的概念,模型分类(有监督、无监督),pyt...

qwop446
2017/09/16
0
0

没有更多内容

加载失败,请刷新页面

加载更多

下一页

[雪峰磁针石博客]软件测试专家工具包1web测试

web测试 本章主要涉及功能测试、自动化测试(参考: 软件自动化测试初学者忠告) 、接口测试(参考:10分钟学会API测试)、跨浏览器测试、可访问性测试和可用性测试的测试工具列表。 安全测试工具...

python测试开发人工智能安全
今天
2
0
JS:异步 - 面试惨案

为什么会写这篇文章,很明显不符合我的性格的东西,原因是前段时间参与了一个面试,对于很多程序员来说,面试时候多么的鸦雀无声,事后心里就有多么的千军万马。去掉最开始毕业干了一年的Jav...

xmqywx
今天
2
0
Win10 64位系统,PHP 扩展 curl插件

执行:1. 拷贝php安装目录下,libeay32.dll、ssleay32.dll 、 libssh2.dll 到 C:\windows\system32 目录。2. 拷贝php/ext目录下, php_curl.dll 到 C:\windows\system32 目录; 3. p...

放飞E梦想O
今天
0
0
谈谈神秘的ES6——(五)解构赋值【对象篇】

上一节课我们了解了有关数组的解构赋值相关内容,这节课,我们接着,来讲讲对象的解构赋值。 解构不仅可以用于数组,还可以用于对象。 let { foo, bar } = { foo: "aaa", bar: "bbb" };fo...

JandenMa
今天
1
0
OSChina 周一乱弹 —— 有人要给本汪介绍妹子啦

Osc乱弹歌单(2018)请戳(这里) 【今日歌曲】 @莱布妮子 :分享水木年华的单曲《中学时代》@小小编辑 手机党少年们想听歌,请使劲儿戳(这里) @须臾时光:夏天还在做最后的挣扎,但是晚上...

小小编辑
今天
68
8

没有更多内容

加载失败,请刷新页面

加载更多

下一页

返回顶部
顶部