文档章节

Wine Quality Prediction

k
 kunping
发布于 2017/03/26 20:48
字数 564
阅读 30
收藏 0

1、下载数据Wine Quality Data Set 

2、删除csv文件的第一行

3、编写spark代码

package com.spark.machine.learning

import org.apache.log4j.{Level, Logger}
import org.apache.spark.ml.linalg.Vectors
import org.apache.spark.ml.regression.LinearRegression
import org.apache.spark.sql.SparkSession

/**
  * Created by 黄坤平 on 2017/3/26.
  */
object WinePredicted extends App {

  Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
  Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)

  //创建SparkSession
  val spark = SparkSession
    .builder()
    .master("local")
    .appName("WinePredicted")
    .config("spark.some.config.option", "some-value")
    .getOrCreate();

  import spark.implicits._

  //读取训练数据并转化为DataFrame
  val trainingDF = spark.sparkContext.textFile("D:\\winequality\\winequality-red.csv").map(_.split(";"))
    .map(w => (w(11).toDouble, Vectors.dense(w(0).toDouble, w(1).toDouble, w(2).toDouble,
      w(3).toDouble, w(4).toDouble, w(5).toDouble, w(6).toDouble, w(7).toDouble,
      w(8).toDouble, w(9).toDouble, w(10).toDouble))).toDF("label", "features")

  //显示训练数据
  trainingDF.show()

  //创建线性回归对象,并设置迭代最大值,通过fix(训练数据)生成模型
  val model = new LinearRegression().setMaxIter(10).fit(trainingDF)

  //使用模型测试
  val testDF = spark.createDataFrame(Seq((5.0, Vectors.dense(7.4,
    0.7, 0.0, 1.9, 0.076, 25.0, 67.0, 0.9968, 3.2, 0.68, 9.8)), (5.0,
    Vectors.dense(7.8, 0.88, 0.0, 2.6, 0.098, 11.0, 34.0, 0.9978, 3.51, 0.56,
      9.4)), (7.0, Vectors.dense(7.3, 0.65, 0.0, 1.2, 0.065, 15.0, 18.0, 0.9968,
    3.36, 0.57, 9.5)))).toDF("label", "features")
  testDF.show()
  testDF.createOrReplaceTempView("test")
  //获取预测值
  val tested = model.transform(testDF).select("features", "label", "prediction")
  tested.show()

  //获取预测的数据
  val predictDF = spark.sql("SELECT features FROM test")
  predictDF.show()

  //预测
  val predicted = model.transform(predictDF).select("features", "prediction")
  predicted.show()
}

4、输出结果


//trainingDF.show()
+-----+--------------------+
|label|            features|
+-----+--------------------+
|  5.0|[7.4,0.7,0.0,1.9,...|
|  5.0|[7.8,0.88,0.0,2.6...|
|  5.0|[7.8,0.76,0.04,2....|
|  6.0|[11.2,0.28,0.56,1...|
|  5.0|[7.4,0.7,0.0,1.9,...|
|  5.0|[7.4,0.66,0.0,1.8...|
|  5.0|[7.9,0.6,0.06,1.6...|
|  7.0|[7.3,0.65,0.0,1.2...|
|  7.0|[7.8,0.58,0.02,2....|
|  5.0|[7.5,0.5,0.36,6.1...|
|  5.0|[6.7,0.58,0.08,1....|
|  5.0|[7.5,0.5,0.36,6.1...|
|  5.0|[5.6,0.615,0.0,1....|
|  5.0|[7.8,0.61,0.29,1....|
|  5.0|[8.9,0.62,0.18,3....|
|  5.0|[8.9,0.62,0.19,3....|
|  7.0|[8.5,0.28,0.56,1....|
|  5.0|[8.1,0.56,0.28,1....|
|  4.0|[7.4,0.59,0.08,4....|
|  6.0|[7.9,0.32,0.51,1....|
+-----+--------------------+
only showing top 20 rows

//testDF.show()
+-----+--------------------+
|label|            features|
+-----+--------------------+
|  5.0|[7.4,0.7,0.0,1.9,...|
|  5.0|[7.8,0.88,0.0,2.6...|
|  7.0|[7.3,0.65,0.0,1.2...|
+-----+--------------------+

//tested.show()
+--------------------+-----+-----------------+
|            features|label|       prediction|
+--------------------+-----+-----------------+
|[7.4,0.7,0.0,1.9,...|  5.0|5.352730835965481|
|[7.8,0.88,0.0,2.6...|  5.0|4.817999361975048|
|[7.3,0.65,0.0,1.2...|  7.0|5.280106355690734|
+--------------------+-----+-----------------+

//predictDF.show()
+--------------------+
|            features|
+--------------------+
|[7.4,0.7,0.0,1.9,...|
|[7.8,0.88,0.0,2.6...|
|[7.3,0.65,0.0,1.2...|
+--------------------+

predicted.show()
+--------------------+-----------------+
|            features|       prediction|
+--------------------+-----------------+
|[7.4,0.7,0.0,1.9,...|5.352730835965481|
|[7.8,0.88,0.0,2.6...|4.817999361975048|
|[7.3,0.65,0.0,1.2...|5.280106355690734|
+--------------------+-----------------+


Process finished with exit code 0

5、总结(创建训练数据 => 创建线性回归对象生成模型 => 预测(或测试))

#主要步骤
	a. 创建训练数据框
		val trainingDF = spark.sparkContext.textFile("D:\\winequality\\winequality-red.csv").map(_.split(";"))
			.map(w => (w(11).toDouble, Vectors.dense(w(0).toDouble, w(1).toDouble, w(2).toDouble,
			w(3).toDouble, w(4).toDouble, w(5).toDouble, w(6).toDouble, w(7).toDouble,
			w(8).toDouble, w(9).toDouble, w(10).toDouble))).toDF("label", "features")
	b. 创建线性回归对象,并设置迭代最大值,通过fix(训练数据)生成模型
		val model = new LinearRegression().setMaxIter(10).fit(trainingDF)
	c. 使用模型进行预测(或测试)
		val predicted = model.transform(predictDF).select("features", "prediction")

6、maven依赖

    <properties>
      <scala.version>2.11.8</scala.version>
      <spark.version>2.1.0</spark.version>
    </properties>

    <!-- scala -->
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
    </dependency>

    <!-- spark -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>

      <!-- spark mllib -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-mllib_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>

 

© 著作权归作者所有

k
粉丝 3
博文 61
码字总数 29867
作品 0
深圳
私信 提问
谷歌的NIMA,让人工智能评价你的照片美不美,我只想知道那些现代艺术作品能得几分?

Introducing NIMA: Neural Image Assessment Monday, December 18, 2017 Posted by Hossein Talebi, Software Engineer and Peyman Milanfar Research Scientist, Machine Perception Quanti......

jtnbcoc8n2i9
2017/12/25
0
0
随机森林入门

Random forest is a highly versatile machine learning method with numerous applications ranging from marketing to healthcare and insurance. It can be used to model the impact of ......

AC-carrot
2016/06/03
96
0
机器学习框架ML.NET学习笔记【9】自动学习

一、概述 本篇我们首先通过回归算法实现一个葡萄酒品质预测的程序,然后通过AutoML的方法再重新实现,通过对比两种实现方式来学习AutoML的应用。 首先数据集来自于竞赛网站kaggle.com的UCI W...

seabluescn
06/10
0
0
首个为 Nexus 4 的 Sailfish OS 构建版本发布

Jolla 已经 发布了首个 Sailfish OS for the Nexus 4. 不过安装很不容易,而且该构建版本离稳定和完整还很远。但这算是 Sailfish 在 Jolla 手机下的第一步。 目前还存在的问题: * We have ...

oschina
2014/03/28
2.6K
7
wine debug 日志输出的相关函数/宏的源码分析

wine debug 日志输出的相关函数/宏的源码分析 首先,大致观察得知,与 debug 相关的声明在 中,并被各个模块所引用。 先看一下 debug class 的定义: 同时,定义了 五种打日志的入口,都是宏...

傅易
2018/08/17
18
0

没有更多内容

加载失败,请刷新页面

加载更多

哪些情况下适合使用云服务器?

我们一直在说云服务器价格适中,具备弹性扩展机制,适合部署中小规模的网站或应用。那么云服务器到底适用于哪些情况呢?如果您需要经常原始计算能力,那么使用独立服务器就能满足需求,因为他...

云漫网络Ruan
今天
5
0
Java 中的 String 有没有长度限制

转载: https://juejin.im/post/5d53653f5188257315539f9a String是Java中很重要的一个数据类型,除了基本数据类型以外,String是被使用的最广泛的了,但是,关于String,其实还是有很多东西...

低至一折起
今天
17
0
OpenStack 简介和几种安装方式总结

OpenStack :是一个由NASA和Rackspace合作研发并发起的,以Apache许可证授权的自由软件和开放源代码项目。项目目标是提供实施简单、可大规模扩展、丰富、标准统一的云计算管理平台。OpenSta...

小海bug
昨天
11
0
DDD(五)

1、引言 之前学习了解了DDD中实体这一概念,那么接下来需要了解的就是值对象、唯一标识。值对象,值就是数字1、2、3,字符串“1”,“2”,“3”,值时对象的特征,对象是一个事物的具体描述...

MrYuZixian
昨天
9
0
解决Mac下VSCode打开zsh乱码

1.乱码问题 iTerm2终端使用Zsh,并且配置Zsh主题,该主题主题需要安装字体来支持箭头效果,在iTerm2中设置这个字体,但是VSCode里这个箭头还是显示乱码。 iTerm2展示如下: VSCode展示如下: 2...

HelloDeveloper
昨天
9
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部