AI - H2O - 第一个示例

2019/06/24 23:54
阅读数 274

1 - Iris数据集

Iris数据集是常用的机器学习分类实验数据集,特点是数据量很小,可以快速学习。 数据集包含150个数据集,分为3类,每类50个数据,每个数据包含4个属性。

  • Sepal.Length(花萼长度),单位是cm
  • Sepal.Width(花萼宽度),单位是cm
  • Petal.Length(花瓣长度),单位是cm
  • Petal.Width(花瓣宽度),单位是cm

可通过以上4个属性预测鸢尾花卉属于以下三个种类中的哪一类

  • Iris Setosa(山鸢尾)
  • Iris Versicolour(杂色鸢尾)
  • Iris Virginica(维吉尼亚鸢尾)

2 - 在Python中运行Iris数据集的深度学习

2.1 - 代码内容

# coding=utf-8
import h2o

h2o.init()  # 默认情况下,H2O实例允许使用所有内核, 并且通常需要25%的系统存储空间

# 准备数据
datasets = "https://raw.githubusercontent.com/DarrenCook/h2o/bk/datasets/"
data = h2o.import_file(datasets + "iris_wheader.csv")  # 输入数据
y = "class"  # 变量y是指要学习的字段名称,在无监督学习中不需要设置此变量
x = data.names  # 从何处学习的字段名称,这里表示所有其他字段
x.remove(y)
train, test = data.split_frame([0.8])  # 分割为训练数据和测试数据,这里选取了80%的数据进行训练,剩下的来进行测试

# 训练模型
m = h2o.estimators.deeplearning.H2ODeepLearningEstimator()  # 使用默认值,创建一个机器学习算法的对象
m.train(x, y, train)  # 开始训练,并指定使用所有的数据集
print("# MSE:", m.mse())  # 显示MSE(均方误差)
print("# Confusion Matrix: \n", m.confusion_matrix(train))  # 显示混淆矩阵(显示每个类别有多少正确, 错误时所选择的类别)

# 使用模型进行预测
p = m.predict(test)
print("# Predict: \n", p)  # 默认只显示前10行
print("# as_data_frame : \n", p.as_data_frame())  # 显示所有行
print("# mean: ", (p["predict"] == test["class"]).mean())  # 显示正确的百分比
print("# cbind: \n", p["predict"].cbind(test["class"]).as_data_frame())  # 显示每个预测的两列输出

# 一些默认约定
# - y变量:H2O中某一列是需要预测的内容,将该列名称定为y变量(在无监督学习中不需要设置此变量)
# - x变量:数据中的一些列或所有其他列是需要从中学习的内容,这些列称为x变量
# - data变量:用于完整的数据
# - train变量:用于训练帧子集
# - valid变量:用于验证的子集
# - test变量:用于测试的子集
# 建议采用更为清楚有意义的简写名称.

2.2 - 显示结果

D:\Temp\Anaconda3\envs\h2o\python.exe D:/Anliven/Anliven-Code/PycharmProjects/TempTest/TempTest_1.py
Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
; Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
  Starting server from D:\Temp\Anaconda3\envs\h2o\lib\site-packages\h2o\backend\bin\h2o.jar
  Ice root: C:\Users\anliven\AppData\Local\Temp\tmptafn6xd_
  JVM stdout: C:\Users\anliven\AppData\Local\Temp\tmptafn6xd_\h2o_anliven_started_from_python.out
  JVM stderr: C:\Users\anliven\AppData\Local\Temp\tmptafn6xd_\h2o_anliven_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.
--------------------------  ------------------------------------------
H2O cluster uptime:         02 secs
H2O cluster timezone:       +08:00
H2O data parsing timezone:  UTC
H2O cluster version:        3.24.0.5
H2O cluster version age:    6 days
H2O cluster name:           H2O_from_python_anliven_be1ik6
H2O cluster total nodes:    1
H2O cluster free memory:    10.64 Gb
H2O cluster total cores:    8
H2O cluster allowed cores:  8
H2O cluster status:         accepting new members, healthy
H2O connection url:         http://127.0.0.1:54321
H2O connection proxy:
H2O internal security:      False
H2O API Extensions:         Amazon S3, Algos, AutoML, Core V3, Core V4
Python version:             3.6.2 final
--------------------------  ------------------------------------------
Parse progress: |█████████████████████████████████████████████████████████| 100%
deeplearning Model Build progress: |██████████████████████████████████████| 100%
# MSE: 0.039118900961189924
# Confusion Matrix: 
 Confusion Matrix: Row labels: Actual class; Column labels: Predicted class

Iris-setosa    Iris-versicolor    Iris-virginica    Error     Rate
-------------  -----------------  ----------------  --------  -------
40             0                  0                 0         0 / 40
0              34                 5                 0.128205  5 / 39
0              0                  38                0         0 / 38
40             34                 43                0.042735  5 / 117

deeplearning prediction progress: |███████████████████████████████████████| 100%
# Predict: 
 predict        Iris-setosa    Iris-versicolor    Iris-virginica
-----------  -------------  -----------------  ----------------
Iris-setosa       0.999995        5.26512e-06       1.22522e-23
Iris-setosa       0.999998        2.10502e-06       2.36894e-24
Iris-setosa       0.999996        4.30403e-06       1.68815e-23
Iris-setosa       0.99995         5.0415e-05        4.90541e-23
Iris-setosa       0.999999        1.23285e-06       4.16845e-24
Iris-setosa       0.999997        3.05992e-06       4.10819e-23
Iris-setosa       0.999946        5.44824e-05       5.15226e-22
Iris-setosa       0.999999        8.97722e-07       2.31546e-23
Iris-setosa       0.99999         9.56155e-06       1.59912e-23
Iris-setosa       1               3.44765e-07       4.95222e-24

[33 rows x 4 columns]

# as_data_frame : 
             predict   Iris-setosa  Iris-versicolor  Iris-virginica
0       Iris-setosa  9.999947e-01     5.265116e-06    1.225220e-23
1       Iris-setosa  9.999979e-01     2.105018e-06    2.368935e-24
2       Iris-setosa  9.999957e-01     4.304033e-06    1.688151e-23
3       Iris-setosa  9.999496e-01     5.041504e-05    4.905406e-23
4       Iris-setosa  9.999988e-01     1.232852e-06    4.168452e-24
5       Iris-setosa  9.999969e-01     3.059924e-06    4.108188e-23
6       Iris-setosa  9.999455e-01     5.448235e-05    5.152261e-22
7       Iris-setosa  9.999991e-01     8.977222e-07    2.315463e-23
8       Iris-setosa  9.999904e-01     9.561553e-06    1.599121e-23
9       Iris-setosa  9.999997e-01     3.447651e-07    4.952222e-24
10  Iris-versicolor  1.285173e-07     9.774696e-01    2.253031e-02
11  Iris-versicolor  8.456613e-05     9.979772e-01    1.938266e-03
12  Iris-versicolor  4.829308e-02     9.517061e-01    8.497348e-07
13  Iris-versicolor  4.169988e-07     9.999681e-01    3.150848e-05
14  Iris-versicolor  1.805217e-06     9.998308e-01    1.673994e-04
15  Iris-versicolor  8.759536e-05     9.999115e-01    8.606799e-07
16  Iris-versicolor  2.206746e-05     9.999167e-01    6.120105e-05
17  Iris-versicolor  3.302204e-06     9.998997e-01    9.695184e-05
18  Iris-versicolor  3.622209e-08     9.389008e-01    6.109913e-02
19  Iris-versicolor  9.407188e-03     9.905912e-01    1.631313e-06
20  Iris-versicolor  1.332645e-03     9.986596e-01    7.739634e-06
21   Iris-virginica  5.299107e-16     7.827116e-07    9.999992e-01
22   Iris-virginica  9.149237e-16     4.476949e-09    1.000000e+00
23   Iris-virginica  4.123180e-13     1.779434e-07    9.999998e-01
24   Iris-virginica  7.280032e-08     6.898109e-03    9.931018e-01
25   Iris-virginica  5.853220e-17     9.229382e-07    9.999991e-01
26   Iris-virginica  1.171212e-12     2.643036e-04    9.997357e-01
27   Iris-virginica  2.345086e-16     2.944686e-09    1.000000e+00
28   Iris-virginica  8.742579e-08     2.479772e-01    7.520227e-01
29   Iris-virginica  1.258946e-09     1.586186e-02    9.841381e-01
30   Iris-virginica  2.918212e-07     1.127815e-02    9.887216e-01
31   Iris-virginica  1.635366e-13     3.913354e-06    9.999961e-01
32   Iris-virginica  1.160129e-11     2.099658e-07    9.999998e-01
# mean:  [1.0]
# cbind: 
             predict            class
0       Iris-setosa      Iris-setosa
1       Iris-setosa      Iris-setosa
2       Iris-setosa      Iris-setosa
3       Iris-setosa      Iris-setosa
4       Iris-setosa      Iris-setosa
5       Iris-setosa      Iris-setosa
6       Iris-setosa      Iris-setosa
7       Iris-setosa      Iris-setosa
8       Iris-setosa      Iris-setosa
9       Iris-setosa      Iris-setosa
10  Iris-versicolor  Iris-versicolor
11  Iris-versicolor  Iris-versicolor
12  Iris-versicolor  Iris-versicolor
13  Iris-versicolor  Iris-versicolor
14  Iris-versicolor  Iris-versicolor
15  Iris-versicolor  Iris-versicolor
16  Iris-versicolor  Iris-versicolor
17  Iris-versicolor  Iris-versicolor
18  Iris-versicolor  Iris-versicolor
19  Iris-versicolor  Iris-versicolor
20  Iris-versicolor  Iris-versicolor
21   Iris-virginica   Iris-virginica
22   Iris-virginica   Iris-virginica
23   Iris-virginica   Iris-virginica
24   Iris-virginica   Iris-virginica
25   Iris-virginica   Iris-virginica
26   Iris-virginica   Iris-virginica
27   Iris-virginica   Iris-virginica
28   Iris-virginica   Iris-virginica
29   Iris-virginica   Iris-virginica
30   Iris-virginica   Iris-virginica
31   Iris-virginica   Iris-virginica
32   Iris-virginica   Iris-virginica
H2O session _sid_aa65 closed.

Process finished with exit code 0

3 - 在Flow(流)中运行Iris数据集的深度学习

Flow:http://docs.h2o.ai/h2o/latest-stable/h2o-docs/flow.html# 作为H2O一部分的Web接口名称(不需要额外的安装步骤),可以完成如下操作:

  • 查看通过客户端上传的数据
  • 直接上传数据
  • 查看通过客户端创建的模型(以及正在创建的模型)
  • 直接创建模型
  • 查看通过客户端生成的预测
  • 直接预测

3.1 - 启动

直接运行jar文件来启动H2O Flow

[Anliven@localhost Downloads]$ pwd
/home/Anliven/Downloads
[Anliven@localhost Downloads]$ ls -l
total 402984
drwxr-xr-x 5 Anliven Anliven        60 Jun 19 08:19 h2o-3.24.0.5
-rw-rw-r-- 1 Anliven Anliven 368257676 Jun 19 21:57 h2o-3.24.0.5.zip
drwxr-xr-x 5 Anliven Anliven        84 Dec 22  2017 h2o-bk
-rw-rw-rw- 1 Anliven Anliven  44392957 Jun 23 22:25 基于H2O的机器学习实用方法.zip
[Anliven@localhost Downloads]$ 
[Anliven@localhost Downloads]$ cd h2o-3.24.0.5/
[Anliven@localhost h2o-3.24.0.5]$ java -jar h2o.jar -ip 192.168.16.101 -port 54321
06-27 22:32:49.845 192.168.16.101:54321  3486   main      INFO: ----- H2O started  -----
06-27 22:32:49.864 192.168.16.101:54321  3486   main      INFO: Build git branch: rel-yates
06-27 22:32:49.864 192.168.16.101:54321  3486   main      INFO: Build git hash: b9cd4d5bcd44a4949ca8c677c5e54c10ee72c968
06-27 22:32:49.864 192.168.16.101:54321  3486   main      INFO: Build git describe: jenkins-3.24.0.4-66-gb9cd4d5
06-27 22:32:49.864 192.168.16.101:54321  3486   main      INFO: Build project version: 3.24.0.5
06-27 22:32:49.864 192.168.16.101:54321  3486   main      INFO: Build age: 8 days
06-27 22:32:49.865 192.168.16.101:54321  3486   main      INFO: Built by: 'jenkins'
06-27 22:32:49.865 192.168.16.101:54321  3486   main      INFO: Built on: '2019-06-18 23:52:14'
06-27 22:32:49.865 192.168.16.101:54321  3486   main      INFO: Found H2O Core extensions: [Watchdog, XGBoost, KrbStandalone]
06-27 22:32:49.865 192.168.16.101:54321  3486   main      INFO: Processed H2O arguments: [-ip, 192.168.16.101, -port, 54321]
06-27 22:32:49.865 192.168.16.101:54321  3486   main      INFO: Java availableProcessors: 2
06-27 22:32:49.865 192.168.16.101:54321  3486   main      INFO: Java heap totalMemory: 240.0 MB
06-27 22:32:49.865 192.168.16.101:54321  3486   main      INFO: Java heap maxMemory: 3.45 GB
06-27 22:32:49.866 192.168.16.101:54321  3486   main      INFO: Java version: Java 1.8.0_161 (from Oracle Corporation)
06-27 22:32:49.866 192.168.16.101:54321  3486   main      INFO: JVM launch parameters: []
06-27 22:32:49.866 192.168.16.101:54321  3486   main      INFO: OS version: Linux 3.10.0-957.el7.x86_64 (amd64)
06-27 22:32:49.866 192.168.16.101:54321  3486   main      INFO: Machine physical memory: 15.51 GB
06-27 22:32:49.866 192.168.16.101:54321  3486   main      INFO: Machine locale: en_US
06-27 22:32:49.866 192.168.16.101:54321  3486   main      INFO: X-h2o-cluster-id: 1561645969069
06-27 22:32:49.866 192.168.16.101:54321  3486   main      INFO: User name: 'Anliven'
06-27 22:32:49.866 192.168.16.101:54321  3486   main      INFO: IPv6 stack selected: false
06-27 22:32:49.867 192.168.16.101:54321  3486   main      INFO: Network interface is down: name:virbr0 (virbr0)
06-27 22:32:49.867 192.168.16.101:54321  3486   main      INFO: Possible IP Address: enp0s8 (enp0s8), fe80:0:0:0:cfdd:6281:f738:fba%enp0s8
06-27 22:32:49.867 192.168.16.101:54321  3486   main      INFO: Possible IP Address: enp0s8 (enp0s8), 192.168.16.101
06-27 22:32:49.867 192.168.16.101:54321  3486   main      INFO: Possible IP Address: enp0s3 (enp0s3), fe80:0:0:0:c48f:c289:276:2308%enp0s3
06-27 22:32:49.867 192.168.16.101:54321  3486   main      INFO: Possible IP Address: enp0s3 (enp0s3), 10.0.2.15
06-27 22:32:49.867 192.168.16.101:54321  3486   main      INFO: Possible IP Address: lo (lo), 0:0:0:0:0:0:0:1%lo
06-27 22:32:49.868 192.168.16.101:54321  3486   main      INFO: Possible IP Address: lo (lo), 127.0.0.1
06-27 22:32:49.868 192.168.16.101:54321  3486   main      INFO: H2O node running in unencrypted mode.
06-27 22:32:49.869 192.168.16.101:54321  3486   main      INFO: Internal communication uses port: 54322
06-27 22:32:49.869 192.168.16.101:54321  3486   main      INFO: Listening for HTTP and REST traffic on http://192.168.16.101:54321/
06-27 22:32:49.870 192.168.16.101:54321  3486   main      INFO: H2O cloud name: 'Anliven' on /192.168.16.101:54321, static configuration based on -flatfile null
06-27 22:32:49.870 192.168.16.101:54321  3486   main      INFO: If you have trouble connecting, try SSH tunneling from your local machine (e.g., via port 55555):
06-27 22:32:49.870 192.168.16.101:54321  3486   main      INFO:   1. Open a terminal and run 'ssh -L 55555:localhost:54321 Anliven@192.168.16.101'
06-27 22:32:49.870 192.168.16.101:54321  3486   main      INFO:   2. Point your browser to http://localhost:55555
06-27 22:32:50.627 192.168.16.101:54321  3486   main      INFO: Log dir: '/tmp/h2o-Anliven/h2ologs'
06-27 22:32:50.627 192.168.16.101:54321  3486   main      INFO: Cur dir: '/home/Anliven/Downloads/h2o-3.24.0.5'
06-27 22:32:50.641 192.168.16.101:54321  3486   main      INFO: Subsystem for distributed import from HTTP/HTTPS successfully initialized
06-27 22:32:50.641 192.168.16.101:54321  3486   main      INFO: HDFS subsystem successfully initialized
06-27 22:32:50.645 192.168.16.101:54321  3486   main      INFO: S3 subsystem successfully initialized
06-27 22:32:50.663 192.168.16.101:54321  3486   main      INFO: GCS subsystem successfully initialized
06-27 22:32:50.663 192.168.16.101:54321  3486   main      INFO: Flow dir: '/home/Anliven/h2oflows'
06-27 22:32:50.681 192.168.16.101:54321  3486   main      INFO: Cloud of size 1 formed [/192.168.16.101:54321]
06-27 22:32:50.690 192.168.16.101:54321  3486   main      INFO: Registered parsers: [GUESS, ARFF, XLS, SVMLight, AVRO, PARQUET, CSV]
06-27 22:32:50.691 192.168.16.101:54321  3486   main      INFO: Watchdog extension initialized
06-27 22:32:50.692 192.168.16.101:54321  3486   main      INFO: XGBoost extension initialized
06-27 22:32:50.692 192.168.16.101:54321  3486   main      INFO: KrbStandalone extension initialized
06-27 22:32:50.692 192.168.16.101:54321  3486   main      INFO: Registered 3 core extensions in: 318ms
06-27 22:32:50.692 192.168.16.101:54321  3486   main      INFO: Registered H2O core extensions: [Watchdog, XGBoost, KrbStandalone]
06-27 22:32:51.041 192.168.16.101:54321  3486   main      INFO: Found XGBoost backend with library: xgboost4j_gpu
06-27 22:32:51.041 192.168.16.101:54321  3486   main      INFO: XGBoost supported backends: [WITH_GPU, WITH_OMP]
06-27 22:32:51.229 192.168.16.101:54321  3486   main      INFO: Registered: 174 REST APIs in: 537ms
06-27 22:32:51.229 192.168.16.101:54321  3486   main      INFO: Registered REST API extensions: [Amazon S3, XGBoost, Algos, AutoML, Core V3, Core V4]
06-27 22:32:51.492 192.168.16.101:54321  3486   main      INFO: Registered: 249 schemas in 263ms
06-27 22:32:51.493 192.168.16.101:54321  3486   main      INFO: H2O started in 2407ms
06-27 22:32:51.493 192.168.16.101:54321  3486   main      INFO: 
06-27 22:32:51.493 192.168.16.101:54321  3486   main      INFO: Open H2O Flow in your web browser: http://192.168.16.101:54321
06-27 22:32:51.493 192.168.16.101:54321  3486   main      INFO: 

3.2 - 数据

在开始界面点击importFiles, 或者在开始页面的顶部菜单依次选择Data-->Import Files 在新出现的Import Files对话框中, 填写Search的路径后点击查找(放大镜图标), 然后在出现的Search Results中选择数据文件, Selected Files将显示选择结果. 注意: 这里的Search路径可以是数据文件的绝对路径,也可以是以h2o.jar文件为参照的相对路径, 例如../h2o-bk/datasets.

单击Import按钮, 将显示文件导入的结果

单击Parse these files可以自定义导入数据文件的设置, 一般情况下最好是保持默认值, 直接点击"Parse"即可.

可以点击View或者iris_wheader1.hex查看详细信息

Actions中选择Split...按钮, 设置如何划分traintest数据集.

点击Create按钮

3.3 - 模型

点击"train"后, 然后点击"Build Model...", 将出现算法选择界面

选择Deep learning, 并选择参数response_columnclass, 其余参数均保持默认值.

然后单击此对话框尾部的"Build Model"按钮, 开始训练

训练完成后, 点击View按钮, 可以查看模型构建的参数和过程.

如果之前已经构建过模型, 那么从开始界面依次选择Model--->List All Models, 然后单击选择的模型, 就能够查看到此模型构建的参数和过程.

3.4 - 预测

从模型视图单击Predict..., 然后指定名称/数据集

或者从开始界面依次选择Score--->Predict, 然后指定名称/选择模型/数据集

确定参数后, 点击Predict, 将看到预测结果

4 - 其他

  • 相比Python,在Flow中可以完成绝大多数类似的操作,不能完成某些数据操作。
  • 在Python中加载数据,可以在Flow中观察;在Flow中加载数据,也可以在Python中观察。
  • 通过Admin菜单下的Water Meter可以查看集群中每个CPU内核的工作状况。
展开阅读全文
打赏
0
0 收藏
分享
加载中
更多评论
打赏
0 评论
0 收藏
0
分享
返回顶部
顶部