Pocket Perceptron 算法

原创
2017/08/03 22:16
阅读数 454

Perceptron (感知器)算法是存在一定限制的

  1. Perceptron 算法针对的数据必须是线性可分的!
  2. Perceptron 算法要运行多久才能运行完毕?

在上一篇文章中(感知器算法),如果出现如下样本:

              orgi_samples = [
                [[20, 10], -1], 
                [[23, 12], -1],
                [[20, 8], -1], 
                [[25, 14], 1], 
                [[28, 15], 1],
                [[30, 38], 1], 
                [[1, 1], 1], # 噪声样本 
                ]

其中有一个是模拟的噪声,那么整个程序就会陷入死循环,感知器整个就会陷入混乱,永远找不到要求的阈值。那么就需要在容忍噪声样本的情况下,在所有阈值中选取一个错误率最小的,那么用公式表示就是:

可惜的是这要进行多少次纠正才能选取所有的阈值,依然是一个未知数,所以必须是控制指定轮次,在有限阈值的情况下进行选取。

具体做法即为 Pocket Perceptron 算法,其核心就是把每次(t)的 W(t) 与最新的 W(t+1) 的成功次数进行比较,如果成功次数 W(t+1) 比较多,则对 Pocket 中的 W 进行替换,这一过程就是其实就是贪心算法。需要注意的是,这个最终的 W,不是完美的,其最终不一定是全部满足所有样本的,即可能是忽略一些噪声的。

Pocket Perceptron 算法步骤:

  1. 初始化 pocket 的 weight 为 W
  2. 轮次 从 t = 0,1,2 ... n (n 为人为控制的次数)
  • (1) 找到一个随机的错误  被叫做  
  • (2) 尝试通过下面的方法修正错误:
  • (3) 如果   让错误更少发生,那么就让  替代掉  ,直到迭代结束。注意,这里迭代次数是由用户自主决定的。
  • (4) 最终取得的 W (被叫做 ) 作为 g。
#coding=utf-8

import numpy as np
import operator
import Queue

class PocketPerceptron(object):

    def __init__(self, orgi_weight = [], orgi_samples = [], run_trun_times = 2):
        """
        Args:
            orgi_weight:  原始权重
                例子:
                orgi_weight = [3, 7]
                
            orgi_samples: 原始样本
                例子:
                orgi_samples = [
                [[20, 10], -1], 
                [[23, 12], -1],
                [[20, 8], -1], 
                [[25, 14], 1], 
                [[28, 15], 1],
                [[30, 38], 1], 
                [[1, 1], 1], # 噪声样本 
                ]
                    
            run_trun_times: 运行轮次倍数(样本数量的倍数),默认为样本数量的2倍
        """
        self.weight_t = 0
        self.weight = np.array(orgi_weight)
        self.orgi_samples = orgi_samples
        # 最大运行轮次为 样本数量的 倍数
        self.max_turn = run_trun_times * len(orgi_samples)
        # 用于存放最佳结果的口袋
        self.pocket = {
            'current_weight_t': 0, 
            'max_succ_cnt': 0
        }
        
        # 用于已经计算过的队列,减少统计时的计算量
        self.out_queue = Queue.Queue()
    
    def add_orgi_sample(self, new_sample=[]):
        self.orgi_samples.append(new_sample)
    
    def statistics_matching(self, sample_t, samples, results):
        # print 'current_wt:', self.weight_t, 'queue_size:', self.out_queue.qsize()
        succ_list = []
        while not self.out_queue.empty():
            elem = self.out_queue.get()
            # print 'current_wt:', self.weight_t, 'queue_inx:', elem
            succ_list.append(elem)
        
        matching_cnt = 0
        for i, elem in enumerate(samples):
            if i in succ_list:
                matching_cnt += 1
                continue
                
            out = (self.weight_t * sample_t) + np.dot(self.weight, elem)
            #print self.weight_t, i, np.sign(out) == results[i]
            if np.sign(out) == results[i]:
                # print 'current_wt:', self.weight_t, 'calc_inx:', i
                matching_cnt += 1
                
        return matching_cnt
    
    def greed(self, sample_t, samples, results):
        """
        贪心函数,一旦发现成功次数多的 weight_t ,立即对 pocket 中的 weight_t 进行替换
        """
        current_weight_t_succ_cnt = self.statistics_matching(sample_t, samples, results)

        if self.pocket['max_succ_cnt'] < current_weight_t_succ_cnt:
            self.pocket['current_weight_t'] = self.weight_t
            self.pocket['max_succ_cnt'] = current_weight_t_succ_cnt
            
    
    def learning(self):
        samples = []
        results = []

        for s in self.orgi_samples:
            samples.append(np.array(s[0]))
            results.append(np.array(s[1]))
            
        sample_t = 1
        inx = 0
        run_turn = 0
        
        self.pocket['current_weight_t'] = self.weight_t
        self.pocket['max_succ_cnt'] = 0
        while inx < len(samples):
            # print 'turn: ', inx, ', w:', self.weight_t   
            out = (self.weight_t * sample_t) + np.dot(self.weight, samples[inx])
            
            if np.sign(out) != results[inx]:
                self.greed(sample_t, samples, results)
                #print 'curr pocket:', self.pocket['current_weight_t'], self.pocket['max_succ_cnt']
                self.weight_t = self.weight_t + (results[inx] - out) * sample_t
                inx = 0
                run_turn += 1
                if run_turn > self.max_turn:
                    break
            else:
                self.out_queue.put(inx)
                inx += 1
        
        self.greed(sample_t, samples, results)
        #print 'finally pocket:', self.pocket['current_weight_t'], self.pocket['max_succ_cnt']
        self.weight_t = self.pocket['current_weight_t']
        print 'finally weight:', self.weight_t
        
    
    def judge(self, new_sample=[]):
        new_sample = np.array(new_sample)
        out = np.dot(self.weight, new_sample) + self.weight_t
        return np.sign(out)


if __name__ == '__main__':

    orgi_samples = [
        [[31, 20], 1], 
        [[20, 10], -1], 
        [[20, 15], -1],
        [[20, 8], -1], 
        [[40, 23], 1],
        [[30, 38], 1], 
        [[1, 1], 1], # 噪声样本 
        [[2, 2], 1], # 噪声样本 
        [[30, 26], 1],
        ]

    orgi_weight = [3, 7]

    print u'输入样本进行学习'
    pocket_perceptron = PocketPerceptron(orgi_weight, orgi_samples, 2)
    pocket_perceptron.learning()
    print ''
    
    print u'样本验证'
    for sample in orgi_samples:
       judge = pocket_perceptron.judge(sample[0]) 
       print sample[0], 'judge out:', judge, 'sample out:', sample[1]
    print ''
    
    print u'新建一个输入,查看此样本是否能被批准'
    # 新建一个输入,查看此样本是否能被批准
    print "30, 8: ", pocket_perceptron.judge([30, 8]) # 1
    print ''
    
    # 添加样本重新学习
    print u'添加样本重新学习'
    pocket_perceptron.add_orgi_sample([[30, 8], -1])
    pocket_perceptron.learning()
    print ''
    
    print u"新样本验证"
    print "30, 8: ", pocket_perceptron.judge([30, 8]) # -1
    print "30, 11: ", pocket_perceptron.judge([30, 11]) # 1
    
    

运行结果:

输入样本进行学习
finally weight: -166

样本验证
[31, 20] judge out: 1 sample out: 1
[20, 10] judge out: -1 sample out: -1
[20, 15] judge out: -1 sample out: -1
[20, 8] judge out: -1 sample out: -1
[40, 23] judge out: 1 sample out: 1
[30, 38] judge out: 1 sample out: 1
[1, 1] judge out: -1 sample out: 1
[2, 2] judge out: -1 sample out: 1
[30, 26] judge out: 1 sample out: 1

新建一个输入,查看此样本是否能被批准
30, 8:  -1

添加样本重新学习
finally weight: -166

新样本验证
30, 8:  -1
30, 11:  1

 

展开阅读全文
加载中
点击引领话题📣 发布并加入讨论🔥
0 评论
0 收藏
0
分享
返回顶部
顶部