KNN算法的python实现
KNN算法的python实现

KNN算法的python实现
• 发表于 8个月前
• 阅读 10
• 收藏 0
• 评论 0

### 1. 处理数据

``````# filename:文件路径  trainingSet:训练集  testSet:测试集 训练数据集数据量/测试数据集数据量的比值取67/33是一个常用的惯例。
with open(filename,'r') as csvfile: #使用open方法打开文件
dataset = list(lines)
for x in range(len(dataset)-1):
for y in range(4):
dataset[x][y] = float(dataset[x][y])
if random.random() < split:
#random.random()用于生成一个0到1的随机符点数: 0 <= n < 1.0。
#随机地切分训练数据集和测试数据集。训练数据集数据量/测试数据集数据量的比值取67/33是一个常用的惯例,所以split取值一般为0.66
trainingSet.append(dataset[x])
else:
testSet.append(dataset[x])
``````

``````trainingSet=[]
testSet=[]
print('trainingSet',repr(len(trainingSet)))
print('testSet',repr(len(testSet)))
``````

### 2. 相似度

``````# length：告诉函数前几个维度需要处理，忽略后面的维度
def euclideanDistance(instance1,instance2,length):
distance = 0
for x in range(length):
distance += pow((instance1[x]-instance2[x]),2) #所有需要计算的维度距离相加
return math.sqrt(distance)
``````

``````data1 = [2,2,2,'a']
data2 = [4,4,4,'b']
# length=3只计算前面三个维度
distance = euclideanDistance(data1,data2,3)
print('distance',repr(distance))
``````

### 3. 邻近相似度

``````# testInstance:待预测数据
def getNeighbors(trainingSet, testInstance, k):
distances = []
length = len(testInstance)-1
for x in range(len(trainingSet)):
#testinstance
dist = euclideanDistance(testInstance, trainingSet[x], length)
distances.append((trainingSet[x], dist))
#distances.append(dist)
distances.sort(key=operator.itemgetter(1))
neighbors = []
for x in range(k):
neighbors.append(distances[x][0])
return neighbors
``````

``````trainSet = [[2,2,2,'a'],[4,4,4,'b']]
testInstance = [5,5,5]
k = 1
neighbors = getNeighbors(trainSet,testInstance,k)
print(neighbors)
``````

### 4. 结果

``````def getResponse(neighbors):
for x in range(len(neighbors)): #遍历最邻近元素
response = neighbors[x][-1] #假设需预测的属性放在数据实例（数组）的最后
else:
``````

``````neighbors= [[1,1,1,'a'],[2,2,2,'a'],[3,3,3,'b']]
response = getResponse(neighbors)
print(response)
``````

### 5. 准确度

``````# 假设predictions为测试集的预测结果集
def getAccuracy(testSet, predictions):
correct = 0
for x in range(len(testSet)):
if testSet[x][-1] is predictions[x]:
correct += 1
return (correct/float(len(testSet))) * 100.0
``````

``````testSet = [[1,1,1,'a'],[2,2,2,'a'],[3,3,3,'b']]
predictions = ['a','a','a']
accuracy = getAccuracy(testSet,predictions)
print(accuracy)
``````

×