OpenCV3的机器学习算法SVM－使用Python 原

openthings

OpenCV3的机器学习算法SVM－使用Python

http://docs.opencv.org/master/d4/db1/tutorial_py_svm_basics.html

Goal

In this chapter

• We will see an intuitive understanding of SVM

SVM 有一个直观理解

Theory

Linearly Separable Data

Consider the image below which has two types of data, red and blue. In kNN, for a test data, we used to measure its distance to all the training samples and take the one with minimum distance. It takes plenty of time to measure all the distances and plenty of memory to store all the training-samples. But considering the data given in image, should we need that much?

image

Consider another idea. We find a line, f(x)=ax1+bx2+c which divides both the data to two regions. When we get a new test_data X, just substitute it in f(x). If f(X)>0, it belongs to blue group, else it belongs to red group. We can call this line as Decision Boundary. It is very simple and memory-efficient. Such data which can be divided into two with a straight line (or hyperplanes in higher dimensions) is called Linear Separable.

So in above image, you can see plenty of such lines are possible. Which one we will take? Very intuitively we can say that the line should be passing as far as possible from all the points. Why? Because there can be noise in the incoming data. This data should not affect the classification accuracy. So taking a farthest line will provide more immunity against noise. So what SVM does is to find a straight line (or hyperplane) with largest minimum distance to the training samples. See the bold line in below image passing through the center.

image

So to find this Decision Boundary, you need training data. Do you need all? NO. Just the ones which are close to the opposite group are sufficient. In our image, they are the one blue filled circle and two red filled squares. We can call them Support Vectors and the lines passing through them are called Support Planes. They are adequate for finding our decision boundary. We need not worry about all the data. It helps in data reduction.

What happened is, first two hyperplanes are found which best represents the data. For eg, blue data is represented by wTx+b0>1 while red data is represented by wTx+b0<−1 where w is weight vector ( w=[w1,w2,...,wn]) and x is the feature vector ( x=[x1,x2,...,xn]). b0 is the bias. Weight vector decides the orientation of decision boundary while bias point decides its location. Now decision boundary is defined to be midway between these hyperplanes, so expressed as wTx+b0=0. The minimum distance from support vector to the decision boundary is given by, distancesupportvectors=1||w||. Margin is twice this distance, and we need to maximize this margin. i.e. we need to minimize a new function L(w,b0) with some constraints which can expressed below:

minw,b0L(w,b0)=12||w||2subject toti(wTx+b0)≥1i

where ti is the label of each class, ti∈[−1,1]

.

ωT x+b0 = 0。从支持向量到决定边界的最短距离为 distancesupport vector = 1 ||ω||

Non-Linearly Separable Data

Consider some data which can't be divided into two with a straight line. For example, consider an one-dimensional data where 'X' is at -3 & +3 and 'O' is at -1 & +1. Clearly it is not linearly separable. But there are methods to solve these kinds of problems. If we can map this data set with a function, f(x)=x2, we get 'X' at 9 and 'O' at 1 which are linear separable.

Otherwise we can convert this one-dimensional to two-dimensional data. We can use f(x)=(x,x2) function to map this data. Then 'X' becomes (-3,9) and (3,9) while 'O' becomes (-1,1) and (1,1). This is also linear separable. In short, chance is more for a non-linear separable data in lower-dimensional space to become linear separable in higher-dimensional space.

In general, it is possible to map points in a d-dimensional space to some D-dimensional space (D>d) to check the possibility of linear separability. There is an idea which helps to compute the dot product in the high-dimensional (kernel) space by performing computations in the low-dimensional input (feature) space. We can illustrate with following example.

Consider two points in two-dimensional space, p=(p1,p2) and q=(q1,q2). Let ϕ be a mapping function which maps a two-dimensional point to three-dimensional space as follows:

ϕ(p)=(p21,p22,2p1p2)ϕ(q)=(q21,q22,2q1q2)

Let us define a kernel function K(p,q) which does a dot product between two points, shown below:

K(p,q)=ϕ(p).ϕ(q)ϕ(p).ϕ(q)=ϕ(p)Tϕ(q)=(p21,p22,2p1p2).(q21,q22,2q1q2)=p1q1+p2q2+2p1q1p2q2=(p1q1+p2q2)2=(p.q)2

It means, a dot product in three-dimensional space can be achieved using squared dot product in two-dimensional space. This can be applied to higher dimensional space. So we can calculate higher dimensional features from lower dimensions itself. Once we map them, we get a higher dimensional space.

In addition to all these concepts, there comes the problem of misclassification. So just finding decision boundary with maximum margin is not sufficient. We need to consider the problem of misclassification errors also. Sometimes, it may be possible to find a decision boundary with less margin, but with reduced misclassification. Anyway we need to modify our model such that it should find decision boundary with maximum margin, but with less misclassification. The minimization criteria is modified as:

min||w||2+C(distanceofmisclassifiedsamplestotheircorrectregions)

Below image shows this concept. For each sample of the training data a new parameter ξi is defined. It is the distance from its corresponding training sample to their correct decision region. For those who are not misclassified, they fall on their corresponding support planes, so their distance is zero.

image

So the new optimization problem is :

minw,b0L(w,b0)=||w||2+Ciξi subject to yi(wTxi+b0)≥1ξi and ξi≥0 ∀i

How should the parameter C be chosen? It is obvious that the answer to this question depends on how the training data is distributed. Although there is no general answer, it is useful to take into account these rules:

• Large values of C give solutions with less misclassification errors but a smaller margin. Consider that in this case it is expensive to make misclassification errors. Since the aim of the optimization is to minimize the argument, few misclassifications errors are allowed.

• Small values of C give solutions with bigger margin and more classification errors. In this case the minimization does not consider that much the term of the sum so it focuses more on finding a hyperplane with big margin.

``现在新的最优化问题就变成了:``

• 如果 C 的取值比较大,错误分类会减少,但是边缘也会减小。其实就是 错误分类的代价比较高,惩罚比较大。(在数据噪声很小时我们可以选取 较大的 C 值。)

• 如果 C 的取值比较小,边缘会比较大,但错误分类的数量会升高。其实 就是错误分类的代价比较低,惩罚很小。整个优化过程就是为了找到一个 具有最大边缘的超平面对数据进行分类。(如果数据噪声比较大时,应该 考虑)

Exercises

openthings

OpenCV3中的机器学习算法

OpenCV3中加入了几种机器学习算法，可以将机器学习算法与图像和视频处理结合起来。可参考： OpenCV/OpenCV3计算机视觉软件支持库和最新资源 OpenCV3的最新特征 OpenCV3的人脸检测－使用Pytho...

openthings
2016/03/01
126
0
Python之CVXOPT模块

Python中支持Convex Optimization（凸规划）的模块为CVXOPT,其安装方式为： 卸载原Pyhon中的Numpy 安装CVXOPT的whl文件，链接为：https://www.lfd.uci.edu/~gohlke/pythonlibs/ 安装Num...

jclian91
02/13
0
0

2015/11/20
0
0

2017/02/20
0
0

meiqi0538
04/20
0
0

go4it

3
0

em_aaron

4
0
CentOS7+git+github创建Python开发环境

1.准备CentOS7 (1)下载VMware Workstation https://pan.baidu.com/s/1miFU8mk (2)下载CentOS7镜像 https://mirrors.aliyun.com/centos/ (3)安装CentOS7系统 http://blog.51cto.com/fengyuns......

3
0

1. js 直接写在html页面上面,ibeetl 就可以动态地利用后台传上来的model List ,不需要每次点击都要ajax请求后台 2. 使用selectpicker 的时候,除了对selecct option的动态处理后,还需要 \$("#...

donald121

3
0
Android SELinux avc dennied权限问题解决方法