Perceptron (感知器)算法是存在一定限制的
- Perceptron 算法针对的数据必须是线性可分的!
- Perceptron 算法要运行多久才能运行完毕?
在上一篇文章中(感知器算法),如果出现如下样本:
orgi_samples = [
[[20, 10], -1],
[[23, 12], -1],
[[20, 8], -1],
[[25, 14], 1],
[[28, 15], 1],
[[30, 38], 1],
[[1, 1], 1], # 噪声样本
]
其中有一个是模拟的噪声,那么整个程序就会陷入死循环,感知器整个就会陷入混乱,永远找不到要求的阈值。那么就需要在容忍噪声样本的情况下,在所有阈值中选取一个错误率最小的,那么用公式表示就是:
可惜的是这要进行多少次纠正才能选取所有的阈值,依然是一个未知数,所以必须是控制指定轮次,在有限阈值的情况下进行选取。
具体做法即为 Pocket Perceptron 算法,其核心就是把每次(t)的 W(t) 与最新的 W(t+1) 的成功次数进行比较,如果成功次数 W(t+1) 比较多,则对 Pocket 中的 W 进行替换,这一过程就是其实就是贪心算法。需要注意的是,这个最终的 W,不是完美的,其最终不一定是全部满足所有样本的,即可能是忽略一些噪声的。
Pocket Perceptron 算法步骤:
- 初始化 pocket 的 weight 为 W
- 轮次 从 t = 0,1,2 ... n (n 为人为控制的次数)
- (1) 找到一个随机的错误
被叫做
- (2) 尝试通过下面的方法修正错误:
- (3) 如果
让错误更少发生,那么就让
替代掉
,直到迭代结束。注意,这里迭代次数是由用户自主决定的。
- (4) 最终取得的 W (被叫做
) 作为 g。
#coding=utf-8
import numpy as np
import operator
import Queue
class PocketPerceptron(object):
def __init__(self, orgi_weight = [], orgi_samples = [], run_trun_times = 2):
"""
Args:
orgi_weight: 原始权重
例子:
orgi_weight = [3, 7]
orgi_samples: 原始样本
例子:
orgi_samples = [
[[20, 10], -1],
[[23, 12], -1],
[[20, 8], -1],
[[25, 14], 1],
[[28, 15], 1],
[[30, 38], 1],
[[1, 1], 1], # 噪声样本
]
run_trun_times: 运行轮次倍数(样本数量的倍数),默认为样本数量的2倍
"""
self.weight_t = 0
self.weight = np.array(orgi_weight)
self.orgi_samples = orgi_samples
# 最大运行轮次为 样本数量的 倍数
self.max_turn = run_trun_times * len(orgi_samples)
# 用于存放最佳结果的口袋
self.pocket = {
'current_weight_t': 0,
'max_succ_cnt': 0
}
# 用于已经计算过的队列,减少统计时的计算量
self.out_queue = Queue.Queue()
def add_orgi_sample(self, new_sample=[]):
self.orgi_samples.append(new_sample)
def statistics_matching(self, sample_t, samples, results):
# print 'current_wt:', self.weight_t, 'queue_size:', self.out_queue.qsize()
succ_list = []
while not self.out_queue.empty():
elem = self.out_queue.get()
# print 'current_wt:', self.weight_t, 'queue_inx:', elem
succ_list.append(elem)
matching_cnt = 0
for i, elem in enumerate(samples):
if i in succ_list:
matching_cnt += 1
continue
out = (self.weight_t * sample_t) + np.dot(self.weight, elem)
#print self.weight_t, i, np.sign(out) == results[i]
if np.sign(out) == results[i]:
# print 'current_wt:', self.weight_t, 'calc_inx:', i
matching_cnt += 1
return matching_cnt
def greed(self, sample_t, samples, results):
"""
贪心函数,一旦发现成功次数多的 weight_t ,立即对 pocket 中的 weight_t 进行替换
"""
current_weight_t_succ_cnt = self.statistics_matching(sample_t, samples, results)
if self.pocket['max_succ_cnt'] < current_weight_t_succ_cnt:
self.pocket['current_weight_t'] = self.weight_t
self.pocket['max_succ_cnt'] = current_weight_t_succ_cnt
def learning(self):
samples = []
results = []
for s in self.orgi_samples:
samples.append(np.array(s[0]))
results.append(np.array(s[1]))
sample_t = 1
inx = 0
run_turn = 0
self.pocket['current_weight_t'] = self.weight_t
self.pocket['max_succ_cnt'] = 0
while inx < len(samples):
# print 'turn: ', inx, ', w:', self.weight_t
out = (self.weight_t * sample_t) + np.dot(self.weight, samples[inx])
if np.sign(out) != results[inx]:
self.greed(sample_t, samples, results)
#print 'curr pocket:', self.pocket['current_weight_t'], self.pocket['max_succ_cnt']
self.weight_t = self.weight_t + (results[inx] - out) * sample_t
inx = 0
run_turn += 1
if run_turn > self.max_turn:
break
else:
self.out_queue.put(inx)
inx += 1
self.greed(sample_t, samples, results)
#print 'finally pocket:', self.pocket['current_weight_t'], self.pocket['max_succ_cnt']
self.weight_t = self.pocket['current_weight_t']
print 'finally weight:', self.weight_t
def judge(self, new_sample=[]):
new_sample = np.array(new_sample)
out = np.dot(self.weight, new_sample) + self.weight_t
return np.sign(out)
if __name__ == '__main__':
orgi_samples = [
[[31, 20], 1],
[[20, 10], -1],
[[20, 15], -1],
[[20, 8], -1],
[[40, 23], 1],
[[30, 38], 1],
[[1, 1], 1], # 噪声样本
[[2, 2], 1], # 噪声样本
[[30, 26], 1],
]
orgi_weight = [3, 7]
print u'输入样本进行学习'
pocket_perceptron = PocketPerceptron(orgi_weight, orgi_samples, 2)
pocket_perceptron.learning()
print ''
print u'样本验证'
for sample in orgi_samples:
judge = pocket_perceptron.judge(sample[0])
print sample[0], 'judge out:', judge, 'sample out:', sample[1]
print ''
print u'新建一个输入,查看此样本是否能被批准'
# 新建一个输入,查看此样本是否能被批准
print "30, 8: ", pocket_perceptron.judge([30, 8]) # 1
print ''
# 添加样本重新学习
print u'添加样本重新学习'
pocket_perceptron.add_orgi_sample([[30, 8], -1])
pocket_perceptron.learning()
print ''
print u"新样本验证"
print "30, 8: ", pocket_perceptron.judge([30, 8]) # -1
print "30, 11: ", pocket_perceptron.judge([30, 11]) # 1
运行结果:
输入样本进行学习
finally weight: -166
样本验证
[31, 20] judge out: 1 sample out: 1
[20, 10] judge out: -1 sample out: -1
[20, 15] judge out: -1 sample out: -1
[20, 8] judge out: -1 sample out: -1
[40, 23] judge out: 1 sample out: 1
[30, 38] judge out: 1 sample out: 1
[1, 1] judge out: -1 sample out: 1
[2, 2] judge out: -1 sample out: 1
[30, 26] judge out: 1 sample out: 1
新建一个输入,查看此样本是否能被批准
30, 8: -1
添加样本重新学习
finally weight: -166
新样本验证
30, 8: -1
30, 11: 1