# 机器学习-K均值算法（K-Means）案例

2019/10/11 11:03

### 背景介绍

K-均值如何形成聚类：

1. K均值为每个群集选取k个点，称为质心。

2. 每个数据点形成具有最接近质心的群集，即k个群集。

3. 根据现有集群成员查找每个集群的质心。在这里，我们有了新的质心。

4. 当我们有了新的质心时，请重复步骤2和3。找到每个数据点与新质心的最近距离，并与新的k簇相关联。重复此过程，直到会聚发生为止，即质心不变。

'''The following code is for the K-MeansCreated by - ANALYTICS VIDHYA'''# importing required librariesimport pandas as pdfrom sklearn.cluster import KMeans# read the train and test datasettrain_data = pd.read_csv('train-data.csv')test_data = pd.read_csv('test-data.csv')# shape of the datasetprint('Shape of training data :',train_data.shape)print('Shape of testing data :',test_data.shape)# Now, we need to divide the training data into differernt clusters# and predict in which cluster a particular data point belongs.  '''Create the object of the K-Means modelYou can also add other parameters and test your code hereSome parameters are : n_clusters and max_iterDocumentation of sklearn KMeans: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html '''model = KMeans()  # fit the model with the training datamodel.fit(train_data)# Number of Clustersprint('\nDefault number of Clusters : ',model.n_clusters)# predict the clusters on the train datasetpredict_train = model.predict(train_data)print('\nCLusters on train data',predict_train) # predict the target on the test datasetpredict_test = model.predict(test_data)print('Clusters on test data',predict_test) # Now, we will train a model with n_cluster = 3model_n3 = KMeans(n_clusters=3)# fit the model with the training datamodel_n3.fit(train_data)# Number of Clustersprint('\nNumber of Clusters : ',model_n3.n_clusters)# predict the clusters on the train datasetpredict_train_3 = model_n3.predict(train_data)print('\nCLusters on train data',predict_train_3) # predict the target on the test datasetpredict_test_3 = model_n3.predict(test_data)print('Clusters on test data',predict_test_3)

Shape of training data : (100, 5)Shape of testing data : (100, 5)Default number of Clusters :  8CLusters on train data [6 7 0 7 6 5 5 7 7 3 1 1 3 0 7 1 0 4 5 6 4 3 3 0 4 0 1 1 0 3 4 3 3 0 0 1 2 1 4 3 0 2 1 1 0 3 3 0 7 1 3 0 5 1 0 1 5 4 6 4 3 6 5 0 3 0 4 33 1 5 1 6 5 7 7 6 3 5 3 5 3 1 5 2 5 0 3 2 3 4 7 1 0 1 5 3 6 1 6]Clusters on test data [3 6 2 0 5 6 0 3 5 2 3 4 5 5 5 3 3 5 5 70 0 5 5 3 5 0 6 5 0 1 6 3 5 6 0 1 7 3 0 0 6 2 0 5 3 5 7 3 3 4 6 3 1 6 3 1 3 3 2 3 3 5 1 7 5 1 53 3 5 2 0 1 5 0 3 0 3 6 3 5 4 0 2 6 3 5 6 0 6 4 3 5 0 6 6 6 1 0]Number of Clusters :  3CLusters on train data [2 0 1 0 2 1 2 0 0 2 0 0 2 1 0 0 1 2 2 2 2 2 2 1 2 1 0 0 1 2 2 2 2 1 1 0 2 0 2 2 1 2 0 0 1 2 2 1 0 0 2 1 2 0 1 0 2 2 2 2 2 2 2 1 2 1 2 22 0 1 0 2 2 0 0 0 2 0 2 2 2 0 2 2 2 1 2 2 2 2 0 0 1 0 2 2 2 0 2]Clusters on test data [2 2 2 1 2 2 1 2 2 2 2 2 2 1 1 2 2 2 2 01 1 2 2 2 2 1 2 2 1 0 2 2 2 2 1 0 0 2 1 1 2 2 1 2 2 2 0 2 2 2 2 2 0 2 2 0 2 2 2 2 2 2 0 0 2 0 22 2 0 2 1 0 2 1 2 1 2 0 2 2 2 1 2 2 2 2 2 1 2 2 2 2 1 2 2 2 0 1]

0
0 收藏

0 评论
0 收藏
0