文档章节

Kylin2.5.0环境搭建及操作记录

P
 PeakFang-BOK
发布于 2018/12/14 15:18
字数 1793
阅读 1177
收藏 2

Apache Kylin是一个开源的分布式分析引擎,提供Hadoop/Spark之上的SQL查询接口及多维分析(OLAP)能力以支持超大规模数据,最初由eBay Inc. 开发并贡献至开源社区。它能在亚秒内查询巨大的Hive表。

伪分布式环境搭建

hadoop-2.7.7安装
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html

hive2.1.1
https://my.oschina.net/peakfang/blog/2236971

hbase1.2.6
http://hbase.apache.org/book.html#_introduction

kylin-2.2.0
http://kylin.apache.org/docs/install/index.html

如果所用的spark为(hive on spark)源码编译不带hive jar包,或者1.6.3版本时,因SPARK_HOME目录下无jars目录,启动kylin时会报如下错误

find: ‘/usr/local/spark-1.6.3/jars’: No such file or directory

[root@node222 local]# vi kylin-2.5.0/bin/find-spark-dependency.sh
# 38行 jars 改成lib
[root@node222 local]# kylin-2.5.0/bin/kylin.sh  start
Retrieving hadoop conf dir...
KYLIN_HOME is set to /usr/local/kylin-2.5.0
Retrieving hive dependency...
Retrieving hbase dependency...
Retrieving hadoop conf dir...
Retrieving kafka dependency...
Retrieving Spark dependency...
Start to check whether we need to migrate acl tables
Retrieving hadoop conf dir...
KYLIN_HOME is set to /usr/local/kylin-2.5.0
Retrieving hive dependency...
Retrieving hbase dependency...
Retrieving hadoop conf dir...
Retrieving kafka dependency...
Retrieving Spark dependency...
......
A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
Check the log at /usr/local/kylin-2.5.0/logs/kylin.log
Web UI is at http://<hostname>:7070/kylin

快速入门

http://kylin.apache.org/docs/tutorial/kylin_sample.html

[root@node222 local]# kylin-2.5.0/bin/sample.sh
Retrieving hadoop conf dir...
Loading sample data into HDFS tmp path: /tmp/kylin/sample_cube/data
......
Loading data to table default.kylin_sales
OK
Time taken: 1.257 seconds
Loading data to table default.kylin_account
OK
Time taken: 0.455 seconds
Loading data to table default.kylin_country
OK
Time taken: 0.385 seconds
Loading data to table default.kylin_cal_dt
OK
Time taken: 0.579 seconds
Loading data to table default.kylin_category_groupings
OK
Time taken: 0.502 seconds
......
Sample cube is created successfully in project 'learn_kylin'.
Restart Kylin Server or click Web UI => System Tab => Reload Metadata to take effect

通过web ui build kylin_sales_cube 如果提示如下错误,则需要启动historyserver服务

ll From node222/192.168.0.222 to localhost:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
java.io.IOException: java.net.ConnectException: Call From node222/192.168.0.222 to localhost:10020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
        at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:334)
        at org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:371)

#启动historyserver服务,再执行成功
[root@node222 ~]# /usr/local/hadoop-2.7.7/sbin/mr-jobhistory-daemon.sh  start  historyserver
starting historyserver, logging to /usr/local/hadoop-2.7.7/logs/mapred-root-historyserver-node222.out

build 过程可通过monitor界面监控执行进度

通过insight界面执行SQL

kylin能执行的查询与model定义的连接类型一致,如model中定义的都是inner join 则insight中只能执行inner join 不能执行left join

结果可简单的通过可视化展示

安装系统模型

http://kylin.apache.org/docs/tutorial/setup_systemcube.html

在KYLIN_HOME目录下创建配置文件,SCSinkTools.json

[
  [
    "org.apache.kylin.tool.metrics.systemcube.util.HiveSinkTool",
    {
      "storage_type": 2,
      "cube_desc_override_properties": [
        "java.util.HashMap",
        {
          "kylin.cube.algorithm": "INMEM",
          "kylin.cube.max-building-segments": "1"
        }
      ]
    }
  ]
]

生成元数据

[root@node222 kylin-2.5.0]# ./bin/kylin.sh org.apache.kylin.tool.metrics.systemcube.SCCreator -inputConfig SCSinkTools.json -output system_cube
Retrieving hadoop conf dir...
KYLIN_HOME is set to /usr/local/kylin-2.5.0
Retrieving hive dependency...
Retrieving hbase dependency...
Retrieving hadoop conf dir...
Retrieving kafka dependency...
Retrieving Spark dependency...
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/kylin-2.5.0/tool/kylin-tool-2.5.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2018-12-14 10:56:36,096 INFO  [main] common.KylinConfig:332 : Loading kylin-defaults.properties from file:/usr/local/kylin-2.5.0/tool/kylin-tool-2.5.0.jar!/kylin-defaults.properties
2018-12-14 10:56:36,136 DEBUG [main] common.KylinConfig:291 : KYLIN_CONF property was not set, will seek KYLIN_HOME env variable
2018-12-14 10:56:36,144 INFO  [main] common.KylinConfig:99 : Initialized a new KylinConfig from getInstanceFromEnv : 1987083830
Running org.apache.kylin.tool.metrics.systemcube.SCCreator -inputConfig SCSinkTools.json -output system_cube
2018-12-14 10:56:36,931 INFO  [main] measure.MeasureTypeFactory:116 : Checking custom measure types from kylin config
2018-12-14 10:56:36,934 INFO  [main] measure.MeasureTypeFactory:145 : registering COUNT_DISTINCT(hllc), class org.apache.kylin.measure.hllc.HLLCMeasureType$Factory
2018-12-14 10:56:36,985 INFO  [main] measure.MeasureTypeFactory:145 : registering COUNT_DISTINCT(bitmap), class org.apache.kylin.measure.bitmap.BitmapMeasureType$Factory
2018-12-14 10:56:37,001 INFO  [main] measure.MeasureTypeFactory:145 : registering TOP_N(topn), class org.apache.kylin.measure.topn.TopNMeasureType$Factory
2018-12-14 10:56:37,006 INFO  [main] measure.MeasureTypeFactory:145 : registering RAW(raw), class org.apache.kylin.measure.raw.RawMeasureType$Factory
2018-12-14 10:56:37,009 INFO  [main] measure.MeasureTypeFactory:145 : registering EXTENDED_COLUMN(extendedcolumn), class org.apache.kylin.measure.extendedcolumn.ExtendedColumnMeasureType$Factory
2018-12-14 10:56:37,011 INFO  [main] measure.MeasureTypeFactory:145 : registering PERCENTILE_APPROX(percentile), class org.apache.kylin.measure.percentile.PercentileMeasureType$Factory
2018-12-14 10:56:37,014 INFO  [main] measure.MeasureTypeFactory:145 : registering COUNT_DISTINCT(dim_dc), class org.apache.kylin.measure.dim.DimCountDistinctMeasureType$Factory

[root@node222 kylin-2.5.0]# ll system_cube/
total 20
-rw-r--r-- 1 root root 3282 Dec 14 10:56 create_hive_tables_for_system_cubes.sql
drwxr-xr-x 2 root root 4096 Dec 14 10:56 cube
drwxr-xr-x 2 root root 4096 Dec 14 10:56 cube_desc
drwxr-xr-x 2 root root 4096 Dec 14 10:56 model_desc
drwxr-xr-x 2 root root   30 Dec 14 10:56 project
drwxr-xr-x 2 root root 4096 Dec 14 10:56 table

创建hive表

[root@node222 kylin-2.5.0]# hive -f system_cube/create_hive_tables_for_system_cubes.sql

Logging initialized using configuration in jar:file:/usr/local/hive-2.1.1/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
OK
Time taken: 2.099 seconds
OK
Time taken: 0.162 seconds
OK
Time taken: 0.741 seconds
OK
Time taken: 0.028 seconds
OK
Time taken: 0.169 seconds
OK
Time taken: 0.027 seconds
OK
Time taken: 0.134 seconds
OK
Time taken: 0.033 seconds
OK
Time taken: 0.15 seconds
OK
Time taken: 0.026 seconds
OK
Time taken: 0.116 seconds

hive> use kylin;
OK
Time taken: 0.053 seconds
hive> show tables;
OK
hive_metrics_job_exception_qa
hive_metrics_job_qa
hive_metrics_query_cube_qa
hive_metrics_query_qa
hive_metrics_query_rpc_qa
Time taken: 0.11 seconds, Fetched: 5 row(s)

更新元数据

[root@node222 kylin-2.5.0]# ./bin/metastore.sh restore system_cube
Starting restoring system_cube
Retrieving hadoop conf dir...
KYLIN_HOME is set to /usr/local/kylin-2.5.0
......
2018-12-14 11:02:40,126 INFO  [main-EventThread] zookeeper.ClientCnxn:512 : EventThread shut down

刷新元数据

在system页面reload metadata

构建system cube

直接通过webui或者脚本构建时都会报错


查看/usr/local/kylin-2.5.0/logs/system_cube_KYLIN_HIVE_METRICS_QUERY_QA_1544756400000.log
2018-12-14 11:17:51,783 ERROR [main] job.CubeBuildingCLI:134 : error start cube building
java.lang.RuntimeException: error execute org.apache.kylin.tool.job.CubeBuildingCLI. Root cause: Inconsistent cube desc signature for CubeDesc [name=KYLIN_HIVE
_METRICS_QUERY_QA]

在webui 上重新保存各个cube,再构建即可。

创建构建脚本

#!/bin/bash

dir=$(dirname ${0})
export KYLIN_HOME=${dir}/../

CUBE=$1
INTERVAL=$2
DELAY=$3
CURRENT_TIME_IN_SECOND=`date +%s`
CURRENT_TIME=$((CURRENT_TIME_IN_SECOND * 1000))
END_TIME=$((CURRENT_TIME-DELAY))
END=$((END_TIME - END_TIME%INTERVAL))

ID="$END"
echo "building for ${CUBE}_${ID}" >> ${KYLIN_HOME}/logs/build_trace.log
sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube ${CUBE} --endTime ${END} > ${KYLIN_HOME}/logs/system_cube_${CUBE}_${END}.log 2>&1 &

测试构建:
[root@node222 kylin-2.5.0]# ./bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_QA 3600000 1200000

配置成定时任务自动运行

[root@node222 kylin-2.5.0]# cat conf/schedule_system_cube_build.cron
0 */2 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_QA 3600000 1200000

20 */2 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_CUBE_QA 3600000 1200000

40 */4 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_RPC_QA 3600000 1200000

30 */4 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_JOB_QA 3600000 1200000

50 */12 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_JOB_EXCEPTION_QA 3600000 12000
[root@node222 kylin-2.5.0]# crontab  conf/schedule_system_cube_build.cron
[root@node222 kylin-2.5.0]# crontab  -l

编译完成后

启用仪表板

http://kylin.apache.org/docs/tutorial/use_dashboard.html

整个过程可以通过KYLIN_HOME/logs/kylin.log文件查看执行日志信息

如果server关闭,再重启kylin服务时报如下错误

[root@node222 ~]# kylin.sh  start
Retrieving hadoop conf dir...

......

Exception in thread "main" java.lang.IllegalArgumentException: Failed to find metadata store by url: kylin_metadata@hbase
        at org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:98)
        at org.apache.kylin.common.persistence.ResourceStore.getStore(ResourceStore.java:110)
        at org.apache.kylin.rest.service.AclTableMigrationTool.checkIfNeedMigrate(AclTableMigrationTool.java:98)
        at org.apache.kylin.tool.AclTableMigrationCLI.main(AclTableMigrationCLI.java:41)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.kylin.common.persistence.ResourceStore.createResourceStore(ResourceStore.java:92)
        ... 3 more

此时通过hive 命令进入hive提示如下错误

[root@node222 ~]# hive

Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/hive/root/c5870040-12ed-47ab-bbc2-84d6ff3f2d24. Name node is in safe mode.
The reported blocks 709 needs additional 410 blocks to reach the threshold 0.9990 of total blocks 1120.
The number of live datanodes 1 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached.
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1335)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3874)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:984)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:634)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)

退出安全模式

[root@node222 ~]# hdfs dfsadmin -safemode leave
Safe mode is OFF

再重启即正常启动了

[root@node222 ~]# kylin.sh  start        
A new Kylin instance is started by root. To stop it, run 'kylin.sh stop'
Check the log at /usr/local/kylin-2.5.0/logs/kylin.log
Web UI is at http://<hostname>:7070/kylin
        
jar cv0f spark-libs.jar -C $KYLIN_HOME/spark/jars/ .
hadoop fs -mkdir -p /kylin/spark/
hadoop fs -put spark-libs.jar /kylin/spark/

© 著作权归作者所有

P
粉丝 7
博文 134
码字总数 336718
作品 0
西安
架构师
私信 提问
elasticsearch实战三部曲之二:文档操作

版权声明:欢迎转载,请注明出处,谢谢。 https://blog.csdn.net/boling_cavalry/article/details/86379882 本文是《elasticsearch实战三部曲》系列的第二篇,上一篇文章我们动手熟悉了索引相...

博陵精骑
01/13
0
0
CentOS 6.5 搭建Hadoop 1.2.1集群

记录在64位CentOS 6.5环境下搭建Hadoop 2.5.2集群的步骤,同时遇到问题的解决办法,这些记录都仅供参考! 1、操作系统环境配置 1.1、操作系统环境 1.2、关闭防火墙和SELinux 1.2.1、关闭防火...

巴利奇
2018/10/30
0
0
为什么 PHP 操作memcached有极低的几率会失败 线上统计350万次请求160次get失败

事情是这样子的 我们memcached的set和get操作是分开接口.set数据的时候有专门的cache刷新接口.所以如果get到空,我们的操作便是脚本返回空,退出. 再说下cache刷新接口的操作吧. 把data set之后...

一碧池一
2017/06/26
225
0
MySQLReplicaion的常用架构

常规复制架构  Master - Slaves 在实际应用场景中,MySQL复制90%以上都是一个Master复制到一个或者多个Slave的架构模式,主要用于读压力比较大的应用的数据库端廉价扩展解决方案。因为只要M...

Perma
2016/04/27
82
0
关于nodejs使用db-oracle数据库访问驱动出现异常求助

最近在学习使用nodejs,并且希望能够实现对oracle数据库进行简单的操作。 在完成所有的环境搭建后,我成功的实现对oralce数据库的简单查询,但是当我进一步执行插入记录操作时,返回了如下异...

limitfly
2011/10/06
2.7K
4

没有更多内容

加载失败,请刷新页面

加载更多

防止流量劫持有效途径-安装SSL证书

  在互联网的世界里,流量就等于金钱。而流量劫持是指,利用各种恶意软件、木马病毒,修改浏览器、锁定主页或不停弹出新窗口等方式,强制用户访问某些网站,从而造成用户流量损失的情形。 ...

安信SSL证书
17分钟前
2
0
关于工作流程数据模版的设置与使用.

关键字:ccbpm数据模版 解释说明:数据模版就是当前工作人员历史发起的历史流程里,把一些具有代表性的设置为模版,下次在启动流程的时候,选择模版里的数据在发起流程,避免重复数据录入的工...

ccflow周朋
31分钟前
4
0
第一个只出现一次的字符

处理字符串中重复或者次数出现等问题,最常用的就是哈希表,用字符串中的字符作为key,字符出现次数作为value,假定只有ASCII码范围内的字符,则可以开辟一个256大小的int数组,将每个字符(...

Garphy
32分钟前
6
0
通过构建区块链来学习区块链-1-Transaction|Block|POW

原文链接:https://medium.com/@vanflymen/learn-blockchains-by-building-one-117428612f46 交易Transaction 区块Block 工作量证明 你来这里是因为,和我一样,你对加密货币的崛起感到兴奋。...

Riverzhou
39分钟前
4
0
大数据最核心的关键技术——32个算法,记得收藏!

奥地利符号计算研究所的Christoph Koutschan博士在自己的页面上发布了一篇文章,提到他做了一个调查,参与者大多数是计算机科学家,他请这些科学家投票选出最重要的算法,以下是这次调查的结...

大数据金罗
今天
12
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部