文档章节

Spark on K8S环境部署细节

o
 osc_2w18qc4t
发布于 02/06 09:20
字数 2392
阅读 18
收藏 0

行业解决方案、产品招募中!想赚钱就来传!>>>

Spark on K8S环境部署细节

time: 2020-1-3

Spark on K8S环境部署细节

本文基于阿里云ACK托管K8S集群
分为以下几个部分:

  • spark-operator on ACK 安装
  • spark wordcount读写OSS
  • spark histroy server on ACK 安装

Spark operator安装

准备kubectl客户端和Helm客户端

  • 配置本地或者内网机器kubectl客户端.
  • 安装helm

使用Aliyun 提供的CloudShell进行操作的时候,一来默认不会保存文件,二来容易连接超时,导致安装spark operator失败,重新安装需要手动删除spark operator的各类资源.

安装Helm的方式:

mkdir -pv helm && cd helm
wget https://storage.googleapis.com/kubernetes-helm/helm-v2.9.1-linux-amd64.tar.gz
tar xf helm-v2.9.1-linux-amd64.tar.gz
sudo mv linux-amd64/helm /usr/local/bin
rm -rf linux-amd64

# 查看版本,不显示出server版本,因为还没有安装server
helm version

安装spark operator

helm install incubator/sparkoperator \
--namespace spark-operator \
--set sparkJobNamespace=default \
--set operatorImageName=registry-vpc.us-east-1.aliyuncs.com/eci_open/spark-operator \
--set operatorVersion=v1beta2-1.0.1-2.4.4 \
--set enableWebhook=true \
--set ingressUrlFormat="\{\{\$appName\}\}.ACK测试域名" \
--set enableBatchScheduler=true	

Note:

  • operatorImageName:这里的region需要改成k8s集群所在区域,默认谷歌的镜像是没办法拉到的,这里使用aliyun提供的镜像.registry-vpc表示使用内网访问registry下载镜像.
  • ingressUrlFormat: 阿里云的K8S集群会提供一个测试域名,可以替换成自己的.
    安装完毕,我们需要手动创建下serviceaccount,使得后面提交的spark作业可以有权限创建driver,executor对应的pod,configMap等资源.

以下创建default:spark servicecount并绑定相关权限:
创建spark-rbac.yaml,并执行kubectl apply -f spark-rbac.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark
  namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: default
  name: spark-role
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["*"]
- apiGroups: [""]
  resources: ["services"]
  verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: spark-role-binding
  namespace: default
subjects:
- kind: ServiceAccount
  name: spark
  namespace: default
roleRef:
  kind: Role
  name: spark-role
  apiGroup: rbac.authorization.k8s.io

Spark wordcount 读写OSS

这里分为以下几步:

  • 准备oss依赖的jar包
  • 准备支持oss文件系统的core-site.xml
  • 打包支持读写oss的spark容器镜像
  • 准备wordcount作业

准备oss依赖的jar包

参照链接:https://help.aliyun.com/document_detail/146237.html?spm=a2c4g.11186623.2.16.4dce2e14IGuHEv
以下可以直接操作,下载到oss依赖的jar包

wget http://gosspublic.alicdn.com/hadoop-spark/hadoop-oss-hdp-2.6.1.0-129.tar.gz?spm=a2c4g.11186623.2.11.54b56c18VGGAzb&file=hadoop-oss-hdp-2.6.1.0-129.tar.gz

tar -xvf hadoop-oss-hdp-2.6.1.0-129.tar

hadoop-oss-hdp-2.6.1.0-129/
hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-ram-3.0.0.jar
hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-core-3.4.0.jar
hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-ecs-4.2.0.jar
hadoop-oss-hdp-2.6.1.0-129/aliyun-java-sdk-sts-3.0.0.jar
hadoop-oss-hdp-2.6.1.0-129/jdom-1.1.jar
hadoop-oss-hdp-2.6.1.0-129/aliyun-sdk-oss-3.4.1.jar
hadoop-oss-hdp-2.6.1.0-129/hadoop-aliyun-2.7.3.2.6.1.0-129.jar

准备core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
    <!-- OSS配置 -->
    <property>
        <name>fs.oss.impl</name>
        <value>org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem</value>
    </property>
    <property>
        <name>fs.oss.endpoint</name>
        <value>oss-cn-hangzhou-internal.aliyuncs.com</value>
    </property>
    <property>
        <name>fs.oss.accessKeyId</name>
        <value>{临时AK_ID}</value>
    </property>
    <property>
        <name>fs.oss.accessKeySecret</name>
        <value>{临时AK_SECRET}</value>
    </property>
    <property>
        <name>fs.oss.buffer.dir</name>
        <value>/tmp/oss</value>
    </property>
    <property>
        <name>fs.oss.connection.secure.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>fs.oss.connection.maximum</name>
        <value>2048</value>
    </property>
</configuration>

打包支持读写oss的镜像

下载spark安装包解压

wget http://apache.communilink.net/spark/spark-3.0.0-preview/spark-3.0.0-preview-bin-hadoop2.7.tgz
tar -xzvf spark-3.0.0-preview-bin-hadoop2.7.tgz

打包发布镜像

在打包之前,需要准备一个docker registry, 可以是docker hub或者是aliyun提供的远程镜像服务.
这里我们使用aliyun的容器镜像服务

  1. docker登录镜像服务
docker login --username=lanrish@1416336129779449 registry.us-east-1.aliyuncs.com

注:

  • 登录建议使用docker免sudo的方式登录,否则执行sudo docker login登录之后,当前用户无法创建镜像.
  • registry.us-east-1.aliyuncs.com这里根据具体选择的地区来决定,默认通过公网访问,我们可以创建k8s集群和镜像服务在同一个地区下(即配置统一的VPC服务),然后在registry后面加一个-vpc,即registry-vpc.us-east-1.aliyuncs.com,这样k8s可以通过内网快速加载容器镜像.
  1. 打包spark镜像
    进入下载解压好的spark路径: cd spark-3.0.0-preview-bin-hadoop2.7
  2. 将oss依赖的jar拷贝到jars目录.
  3. 将支持oss的core-site.xml放入conf目录.
  4. 修改kubernetes/dockerfiles/spark/Dockerfile
    修改如下,重点在19,34,37行,主要为了可以让spark通过HADOOP_CONF_DIR环境变量去自动加载core-site.xml,之所以这么麻烦而不使用ConfigMap,是因为spark 3.0目前存在bug,详见: https://www.jianshu.com/p/d051aa95b241
  1. FROM openjdk:8-jdk-slim 

  2.  

  3. ARG spark_uid=185 

  4.  

  5. # Before building the docker image, first build and make a Spark distribution following 

  6. # the instructions in http://spark.apache.org/docs/latest/building-spark.html. 

  7. # If this docker file is being used in the context of building your images from a Spark 

  8. # distribution, the docker build command should be invoked from the top level directory 

  9. # of the Spark distribution. E.g.: 

  10. # docker build -t spark:latest -f kubernetes/dockerfiles/spark/Dockerfile . 

  11.  

  12. RUN set -ex && \ 

  13. apt-get update && \ 

  14. ln -s /lib /lib64 && \ 

  15. apt install -y bash tini libc6 libpam-modules krb5-user libnss3 && \ 

  16. mkdir -p /opt/spark && \ 

  17. mkdir -p /opt/spark/examples && \ 

  18. mkdir -p /opt/spark/work-dir && \ 

  19. mkdir -p /opt/hadoop/conf && \ 

  20. touch /opt/spark/RELEASE && \ 

  21. rm /bin/sh && \ 

  22. ln -sv /bin/bash /bin/sh && \ 

  23. echo "auth required pam_wheel.so use_uid" >> /etc/pam.d/su && \ 

  24. chgrp root /etc/passwd && chmod ug+rw /etc/passwd && \ 

  25. rm -rf /var/cache/apt/* 

  26.  

  27. COPY jars /opt/spark/jars 

  28. COPY bin /opt/spark/bin 

  29. COPY sbin /opt/spark/sbin 

  30. COPY kubernetes/dockerfiles/spark/entrypoint.sh /opt/ 

  31. COPY examples /opt/spark/examples 

  32. COPY kubernetes/tests /opt/spark/tests 

  33. COPY data /opt/spark/data 

  34. COPY conf/core-site.xml /opt/hadoop/conf 

  35. ENV SPARK_HOME /opt/spark 

  36. ENV HADOOP_HOME /opt/hadoop 

  37. ENV HADOOP_CONF_DIR /opt/hadoop/conf 

  38. WORKDIR /opt/spark/work-dir 

  39. RUN chmod g+w /opt/spark/work-dir 

  40.  

  41. ENTRYPOINT [ "/opt/entrypoint.sh" ] 

  42.  

  43. # Specify the User that the actual main process will run as 

  44. USER ${spark_uid} 

  1. 构建镜像
# 构建镜像
./bin/docker-image-tool.sh -r registry.us-east-1.aliyuncs.com/engineplus -t 3.0.0 build  
# 发布镜像
docker push registry.us-east-1.aliyuncs.com/engineplus/spark:3.0.0

如果需要在镜像中部署额外的依赖环境,则需要使用以下方式:
在spark当前目录spark-3.0.0-preview-bin-hadoop2.7通过Dockerfile的方式构建自定义镜像:

docker build -t registry.us-east-1.aliyuncs.com/spark:3.0.0 -f  kubernetes/dockerfiles/spark/Dockerfile

可以将自定义的依赖环境定义到kubernetes/dockerfiles/spark/Dockerfile中.

准备wordcount作业

wordcount作业可以从这里clone: https://github.com/i-mine/spark_k8s_wordcount
下载可以直接执行mvn clean package
得到wordcount jar: target/spark_k8s_wordcount-1.0-SNAPSHOT.jar

1. spark submit 提交

注: 这种提交方式中,可以上传本地的jar,但是同时需要本地提交环境已经配置过hadoop关于oss的环境.

bin/spark-submit \
--master k8s://https://192.168.17.175:6443 \
--deploy-mode cluster \
--name com.mobvista.dataplatform.WordCount \
--class com.mobvista.dataplatform.WordCount \
--conf spark.kubernetes.file.upload.path=oss://mob-emr-test/lei.du/tmp \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=registry.us-east-1.aliyuncs.com/engineplus/spark:3.0.0-oss \
/home/hadoop/dulei/spark-3.0.0-preview2-bin-hadoop2.7/spark_k8s_wordcount-1.0-SNAPSHOT.jar

2. spark operator 提交

注: 这种提交方式中,spark依赖的jar只可以是镜像中已经存在的或者是通过远程访问,无法自动将本地的jar上传给spark作业,需要自己手动上传到oss或者s3,且spark镜像中已经存在oss或者s3的访问配置和依赖的jar.
编写spark operator word-count.yaml,这种方式需要提前将jar包打包到镜像中,或者上传到云上.

apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: wordcount
  namespace: default
spec:
  type: Scala
  mode: cluster
  image: "registry.us-east-1.aliyuncs.com/engineplus/spark:3.0.0-oss"
  imagePullPolicy: IfNotPresent
  mainClass: com.mobvista.dataplatform.WordCount
  mainApplicationFile: "oss://mob-emr-test/lei.du/lib/spark_k8s_wordcount-1.0-SNAPSHOT.jar"
  sparkVersion: "3.0.0"
  restartPolicy:
    type: OnFailure
    onFailureRetries: 2
    onFailureRetryInterval: 5
    onSubmissionFailureRetries: 2
    onSubmissionFailureRetryInterval: 10
  timeToLiveSeconds: 3600
  sparkConf:
    "spark.kubernetes.allocation.batch.size": "10"
    "spark.eventLog.enabled": "true"
    "spark.eventLog.dir": "oss://mob-emr-test/lei.du/tmp/logs"
  hadoopConfigMap: oss-hadoop-dir
  driver:
    cores: 1
    memory: "1024m"
    labels:
      version: 3.0.0
      spark-app: spark-wordcount
      role: driver
    annotations:
      k8s.aliyun.com/eci-image-cache: "true"
    serviceAccount: spark
  executor:
    cores: 1
    instances: 1
    memory: "1024m"
    labels:
      version: 3.0.0
      role: executor
    annotations:
      k8s.aliyun.com/eci-image-cache: "true"

作业执行过程中我们可以获取ingress-url进行访问WEB UI查看作业执行状态,但是作业执行完毕无法查看:

  1. $ kubectl describe sparkapplication 

  2. Name: wordcount 

  3. Namespace: default 

  4. Labels: <none> 

  5. Annotations: kubectl.kubernetes.io/last-applied-configuration: 

  6. {"apiVersion":"sparkoperator.k8s.io/v1beta2","kind":"SparkApplication","metadata":{"annotations":{},"name":"wordcount","namespace":"defaul... 

  7. API Version: sparkoperator.k8s.io/v1beta2 

  8. Kind: SparkApplication 

  9. Metadata: 

  10. Creation Timestamp: 2020-01-03T08:18:58Z 

  11. Generation: 2 

  12. Resource Version: 53192098 

  13. Self Link: /apis/sparkoperator.k8s.io/v1beta2/namespaces/default/sparkapplications/wordcount 

  14. UID: b0b1ff99-2e01-11ea-bf95-7e8505108e63 

  15. Spec: 

  16. Driver: 

  17. Annotations: 

  18. k8s.aliyun.com/eci-image-cache: true 

  19. Cores: 1 

  20. Labels: 

  21. Role: driver 

  22. Spark - App: spark-wordcount 

  23. Version: 3.0.0 

  24. Memory: 1024m 

  25. Service Account: spark 

  26. Executor: 

  27. Annotations: 

  28. k8s.aliyun.com/eci-image-cache: true 

  29. Cores: 1 

  30. Instances: 1 

  31. Labels: 

  32. Role: executor 

  33. Version: 3.0.0 

  34. Memory: 1024m 

  35. Image: registry.us-east-1.aliyuncs.com/engineplus/spark:3.0.0-oss-wordcount 

  36. Image Pull Policy: IfNotPresent 

  37. Main Application File: /opt/spark/jars/spark_k8s_wordcount-1.0-SNAPSHOT.jar 

  38. Main Class: WordCount 

  39. Mode: cluster 

  40. Restart Policy: 

  41. On Failure Retries: 2 

  42. On Failure Retry Interval: 5 

  43. On Submission Failure Retries: 2 

  44. On Submission Failure Retry Interval: 10 

  45. Type: OnFailure 

  46. Spark Conf: 

  47. spark.kubernetes.allocation.batch.size: 10 

  48. Spark Version: 3.0.0 

  49. Time To Live Seconds: 3600 

  50. Type: Scala 

  51. Status: 

  52. Application State: 

  53. Error Message: driver pod failed with ExitCode: 1, Reason: Error 

  54. State: FAILED 

  55. Driver Info: 

  56. Pod Name: wordcount-driver 

  57. Web UI Address: 172.21.14.219:4040 

  58. Web UI Ingress Address: wordcount.cac1e2ca4865f4164b9ce6dd46c769d59.us-east-1.alicontainer.com 

  59. Web UI Ingress Name: wordcount-ui-ingress 

  60. Web UI Port: 4040 

  61. Web UI Service Name: wordcount-ui-svc 

  62. Execution Attempts: 3 

  63. Last Submission Attempt Time: 2020-01-03T08:21:51Z 

  64. Spark Application Id: spark-4c66cd4e3e094571844bbc355a1b6a16 

  65. Submission Attempts: 1 

  66. Submission ID: e4ce0cb8-7719-4c6f-ade1-4c13e137de77 

  67. Termination Time: 2020-01-03T08:22:01Z 

  68. Events: 

  69. Type Reason Age From Message 

  70. ---- ------ ---- ---- ------- 

  71. Normal SparkApplicationAdded 7m20s spark-operator SparkApplication wordcount was added, enqueuing it for submission 

  72. Warning SparkApplicationFailed 6m20s spark-operator SparkApplication wordcount failed: driver pod failed with ExitCode: 101, Reason: Error 

  73. Normal SparkApplicationSpecUpdateProcessed 5m43s spark-operator Successfully processed spec update for SparkApplication wordcount 

  74. Warning SparkDriverFailed 4m47s (x5 over 7m10s) spark-operator Driver wordcount-driver failed 

  75. Warning SparkApplicationPendingRerun 4m32s (x5 over 7m2s) spark-operator SparkApplication wordcount is pending rerun 

  76. Normal SparkApplicationSubmitted 4m27s (x6 over 7m16s) spark-operator SparkApplication wordcount was submitted successfully 

  77. Normal SparkDriverRunning 4m24s (x6 over 7m14s) spark-operator Driver wordcount-driver is running 

安装Spark Histroy Server On K8S

这里我们使用由Helm chart提供的Spark History Server
GitHub: https://github.com/SnappyDataInc/spark-on-k8s/tree/master/charts/spark-hs?spm=5176.2020520152.0.0.2d5916ddP2xqfh
为了方便,直接通过Aliyun的应用市场进行安装:
应用介绍: https://cs.console.aliyun.com/#/k8s/catalog/detail/incubator_ack-spark-history-server

在创建之前,填写oss相关的配置,然后创建即可:
enter description here

安装完毕通过查看k8s的server,可以获取到spark history server的访问地址
enter description here

创建成功后,提交作业的时候,需要添加两条配置:

 "spark.eventLog.enabled": "true"
 "spark.eventLog.dir": "oss://mob-emr-test/lei.du/tmp/logs"

这样提交的作业日志就会存储在OSS.

enter description here
enter description here

o
粉丝 0
博文 500
码字总数 0
作品 0
私信 提问
加载中
请先登录后再评论。
5分钟 maven3 快速入门指南

前提条件 你首先需要了解如何在电脑上安装软件。如果你不知道如何做到这一点,请询问你办公室,学校里的人,或花钱找人来解释这个给你。 不建议给Maven的服务邮箱来发邮件寻求支持。 安装Mav...

fanl1982
2014/01/23
1.2W
6
工作流管理系统--Pegasus WMS

Pegasus (飞马座)工作流管理系统包括一套技术标准工作流程应用程序中执行帮助许多不同的环境中,包括桌面、校园集群、网格、云。它弥补了科学领域和执行环境通过自 动映射到分布式资源的高层工...

匿名
2013/02/24
5.4K
0
WSGI Web服务器--UV-Web

uv-web是一个轻量级的支持高并发的WSGI Web服务器,基于libuv构建,部分代码源于开源项目bjoern,本质是python的C扩展,所以适用于部署绝大部分 python web应用(如 Django) 特性 兼容 HTTP 1...

Jone.x
2013/03/04
1.7K
0
PHP OAuth API

PHP OAuth API可以授权访问的用户一个API使用OAuth协议。它抽象OAuth 1.0,1.0和2.0在同一个类,因此您可以使用相同的“获得一个令牌”授权访问代表当前用户的任何API支持任何版本的OAuth协议。...

匿名
2012/11/01
3.9K
0
Android下的Shell环境--ZShaolin

ZShaolin 在你的 Android 设备上安装一个小型而且强大的脚本和 Shell 环境。并提供编辑、写和上传图片、音频、视频的应用:FFMpeg, ImageMagick, Sox, Oggz, Lua, GNU awk, sed, grep, and ...

匿名
2012/11/21
2.1K
0

没有更多内容

加载失败,请刷新页面

加载更多

React Native常用第三方组件汇总

react-native-animatable 动画 react-native-carousel 轮播 react-native-countdown 倒计时 react-native-device-info 设备信息 react-native-fileupload 文件上传 react-native-icons 图标 ......

mdoo
33分钟前
8
0
troubleshoot之:用control+break解决线程死锁问题

简介 如果我们在程序中遇到线程死锁的时候,该怎么去解决呢? 本文将会从一个实际的例子出发,一步一步的揭开java问题解决的面纱。 死锁的代码 写过java多线程程序的人应该都知道,多线程中一...

flydean
33分钟前
19
0
无法打开与身份验证代理的连接 - Could not open a connection to your authentication agent

问题: I am running into this error of: 我遇到以下错误: $ git push heroku masterWarning: Permanently added the RSA host key for IP address '50.19.85.132' to the list of known ......

法国红酒甜
50分钟前
25
0
Trivy

Trivy 是一个面向镜像的漏洞检测工具,具备如下特点: 开源 免费 易用 准确度高 CI 友好 相对于老前辈 Clair,Trivy 的使用非常直观方便,适用于更多的场景。 下面是官方出具的对比表格: 扫...

LitStone
今天
9
0
在UITableView中使用自动布局以获取动态单元格布局和可变的行高

问题: 如何在表格视图的UITableViewCell使用自动布局,以使每个单元格的内容和子视图确定行高(自身/自动),同时保持流畅的滚动性能? 解决方案: 参考一: https://stackoom.com/questio...

技术盛宴
今天
13
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部