SparkMl-Interaction (特征交互-笛卡尔)

原创
2019/11/15 18:23
阅读数 929

Interaction (特征交互-笛卡尔)

描述:Interaction是一个Transformer。

     它使用向量或双值列,并生成单个向量列,其中包含每个输入列的一个值的所有组合的乘积。例如,如果您有两个向量类型列,每个列有3个维度作为输入列,那么您将获得一个9维向量作为输出列。

参数信息 参数描述 备注 其他
setInputCol String DF中待变换的特征,特征类型必须为:vector  
setOutputCol String 变换后的特征名称,转换后的类型为:vector  

程序示例:

def getDataFrame(sparkSession: SparkSession = this.getSparkSession()): DataFrame = {
     sparkSession.createDataFrame(Seq(
          (1, 1, 2, 3, 8, 4, 5),
          (2, 4, 3, 8, 7, 9, 8),
          (3, 6, 1, 9, 2, 3, 6),
          (4, 10, 8, 6, 9, 4, 5),
          (5, 9, 2, 7, 10, 7, 3),
          (6, 1, 1, 4, 2, 8, 4)
    )).toDF("id1", "id2", "id3", "id4", "id5", "id6", "id7")
}

def execute(dataFrame: DataFrame) = {
    //数据预处理
    val assembler1 = new VectorAssembler().setInputCols(Array("id2", "id3", "id4")).setOutputCol("vec1")
    val assembler2 = new VectorAssembler().setInputCols(Array("id5", "id6", "id7")).setOutputCol("vec2")
    val assembled = assembler2.transform(assembler1.transform(dataFrame))
    //特征笛卡尔积
    val interaction = new Interaction()
    .setInputCols(Array("id1", "vec1", "vec2"))
    .setOutputCol("interactedCol")
    //转换
    val interacted = interaction.transform(assembled)
    //show
    dataFrame.show()
    interacted.show(truncate = false)
    interacted.printSchema()
}

原始数据:

+---+---+---+---+---+---+---+
|id1|id2|id3|id4|id5|id6|id7|
+---+---+---+---+---+---+---+
|  1|  1|  2|  3|  8|  4|  5|
|  2|  4|  3|  8|  7|  9|  8|
|  3|  6|  1|  9|  2|  3|  6|
|  4| 10|  8|  6|  9|  4|  5|
|  5|  9|  2|  7| 10|  7|  3|
|  6|  1|  1|  4|  2|  8|  4|
+---+---+---+---+---+---+---+

数据结果:

+---+---+---+---+---+---+---+--------------+--------------+------------------------------------------------------+
|id1|id2|id3|id4|id5|id6|id7|vec1          |vec2          |interactedCol                                         |
+---+---+---+---+---+---+---+--------------+--------------+------------------------------------------------------+
|1  |1  |2  |3  |8  |4  |5  |[1.0,2.0,3.0] |[8.0,4.0,5.0] |[8.0,4.0,5.0,16.0,8.0,10.0,24.0,12.0,15.0]            |
|2  |4  |3  |8  |7  |9  |8  |[4.0,3.0,8.0] |[7.0,9.0,8.0] |[56.0,72.0,64.0,42.0,54.0,48.0,112.0,144.0,128.0]     |
|3  |6  |1  |9  |2  |3  |6  |[6.0,1.0,9.0] |[2.0,3.0,6.0] |[36.0,54.0,108.0,6.0,9.0,18.0,54.0,81.0,162.0]        |
|4  |10 |8  |6  |9  |4  |5  |[10.0,8.0,6.0]|[9.0,4.0,5.0] |[360.0,160.0,200.0,288.0,128.0,160.0,216.0,96.0,120.0]|
|5  |9  |2  |7  |10 |7  |3  |[9.0,2.0,7.0] |[10.0,7.0,3.0]|[450.0,315.0,135.0,100.0,70.0,30.0,350.0,245.0,105.0] |
|6  |1  |1  |4  |2  |8  |4  |[1.0,1.0,4.0] |[2.0,8.0,4.0] |[12.0,48.0,24.0,12.0,48.0,24.0,48.0,192.0,96.0]       |
+---+---+---+---+---+---+---+--------------+--------------+------------------------------------------------------+

实际应用例子:

      暂无。

更多博客推荐:

      SparkML(2.1.0)机器学习库指南

      本文链接:SparkMl-Interaction (特征交互-笛卡尔)

展开阅读全文
打赏
0
0 收藏
分享
加载中
更多评论
打赏
0 评论
0 收藏
0
分享
返回顶部
顶部