文档章节

HiveQL计算连续天数问题

K_Zhiqiang
 K_Zhiqiang
发布于 2017/08/10 10:59
字数 876
阅读 362
收藏 0

精选30+云产品,助力企业轻松上云!>>>

现有商户每日交易汇总数据文件merch_trade_stat.txt,如下:(三列数据以','分隔,分别是商户ID、交易日期、日交易金额)

[root@node1 ~]$ more merch_trade_day_stat.txt
1,2017-07-01,100
1,2017-07-02,200
1,2017-07-03,300
1,2017-07-04,400
1,2017-07-05,500
1,2017-07-06,600
1,2017-07-07,40
1,2017-07-08,800
1,2017-07-09,900
1,2017-07-10,50
1,2017-07-11,100
1,2017-07-12,80
1,2017-07-13,300
1,2017-07-14,400
1,2017-07-15,500
2,2017-07-01,100
2,2017-07-02,200
2,2017-07-03,300
2,2017-07-04,60
2,2017-07-05,500
2,2017-07-06,600
2,2017-07-07,40
2,2017-07-08,800
2,2017-07-09,900
2,2017-07-10,50
2,2017-07-11,100
2,2017-07-12,80
2,2017-07-13,300
2,2017-07-14,400
2,2017-07-15,500

计算出每个商户日交易金额不小于100的最大连续天数:

hive> CREATE TABLE merch_trade_day_stat(
    >     merch_id string COMMENT '商户ID',
    >     date_key string COMMENT '交易日期',
    >     tx_amt int COMMENT '日交易金额'
    > ) ROW FORMAT DELIMITED
    >  FIELDS TERMINATED BY ',';
OK
Time taken: 0.375 seconds
hive> load data local inpath 'merch_trade_day_stat.txt' into table merch_trade_day_stat;
Loading data to table default.merch_trade_day_stat
Table default.merch_trade_day_stat stats: [numFiles=1, totalSize=443]
OK
Time taken: 0.45 seconds
hive> select * from merch_trade_day_stat;
OK
1	2017-07-01	100
1	2017-07-02	200
1	2017-07-03	300
1	2017-07-04	400
1	2017-07-05	500
1	2017-07-06	600
1	2017-07-07	40
1	2017-07-08	800
1	2017-07-09	900
1	2017-07-10	50
1	2017-07-11	100
1	2017-07-12	80
1	2017-07-13	300
1	2017-07-14	400
1	2017-07-15	500
2	2017-07-01	100
2	2017-07-02	200
2	2017-07-03	300
2	2017-07-04	60
2	2017-07-05	500
2	2017-07-06	600
2	2017-07-07	40
2	2017-07-08	800
2	2017-07-09	900
2	2017-07-10	50
2	2017-07-11	100
2	2017-07-12	80
2	2017-07-13	300
2	2017-07-14	400
2	2017-07-15	500
Time taken: 0.069 seconds, Fetched: 30 row(s)
hive> select a.merch_id,
    >        max(a.continue_days) as max_continue_days
    >     from(select a.merch_id,
    >                 count(a.date_key) as continue_days
    >             from(select merch_id,
    >                         date_key,
    >                         date_sub(date_key, row_number() over(partition by merch_id order by date_key)) as tmp_date
    >                     from merch_trade_day_stat
    >                     where tx_amt >= 100
    >                 ) a
    >             group by a.merch_id, a.tmp_date
    >         ) a
    >     group by a.merch_id;
Query ID = bd_20170810104913_29569c95-1110-4ed4-906e-b09ba6712ac7
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1499860627544_0066, Tracking URL = http://ali-bj01-tst-cluster-004.xiweiai.cn:8088/proxy/application_1499860627544_0066/
Kill Command = /mnt/bd/software/hadoop/hadoop-2.6.2/bin/hadoop job  -kill job_1499860627544_0066
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2017-08-10 10:49:18,583 Stage-1 map = 0%,  reduce = 0%
2017-08-10 10:49:24,792 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.26 sec
2017-08-10 10:49:29,929 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.85 sec
MapReduce Total cumulative CPU time: 2 seconds 850 msec
Ended Job = job_1499860627544_0066
Launching Job 2 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1499860627544_0067, Tracking URL = http://ali-bj01-tst-cluster-004.xiweiai.cn:8088/proxy/application_1499860627544_0067/
Kill Command = /mnt/bd/software/hadoop/hadoop-2.6.2/bin/hadoop job  -kill job_1499860627544_0067
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2017-08-10 10:49:40,581 Stage-2 map = 0%,  reduce = 0%
2017-08-10 10:49:44,691 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 0.74 sec
2017-08-10 10:49:49,826 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 1.93 sec
MapReduce Total cumulative CPU time: 1 seconds 930 msec
Ended Job = job_1499860627544_0067
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.85 sec   HDFS Read: 9593 HDFS Write: 321 SUCCESS
Stage-Stage-2: Map: 1  Reduce: 1   Cumulative CPU: 1.93 sec   HDFS Read: 6039 HDFS Write: 8 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 780 msec
OK
1	6
2	3
Time taken: 37.05 seconds, Fetched: 2 row(s)
上一篇: Git教程
下一篇: HBase-1.1.10表备份
K_Zhiqiang
粉丝 0
博文 29
码字总数 20087
作品 0
海淀
程序员
私信 提问
加载中
请先登录后再评论。
oracle计算连续登陆/上班天数

现在有一个计算用户连续上班天数的报表,发现通过用row_number分析函数可以完美计算这个问题。 这个SQL可以解决计算用户连续登陆、签到、上班、旷工等问题。 首先将row_number按照日期排序 ...

zxf261
2016/07/21
0
0
oracle计算连续登陆/上班天数

现在有一个计算用户连续上班天数的报表,发现通过用row_number分析函数可以完美计算这个问题。 这个SQL可以解决计算用户连续登陆、签到、上班、旷工等问题。 首先将row_number按照日期排序 ...

zxf261
2016/07/21
0
0
sql技巧(三)在线时长|最大连续登陆天数

1.在线时长 正常的游戏有登录和登出但是我们想计算下在线时长,这就涉及到根据登陆表的登录时间查找登出时间的问题,基本的原理是登陆表和登出表进行匹配,查找离一条登录记录最近的登出时间...

洛水青柳2017
2017/11/06
0
0
使用MySQL窗口函数巧妙的解决用户连续登录天数等问题 - 知乎

我们在工作中或者面试时,可能会遇到求出用户连续登录天数、连续签到天数等问题,这篇文章就是用窗口函数比较巧妙的解决这个问题的。文章比较长,建议先收藏后观看+_+ 首先是数据处理 如下代...

大数据分析·人工智能
05/14
0
0
使用MySQL窗口函数巧妙的解决用户连续登录天数等问题 - 知乎

我们在工作中或者面试时,可能会遇到求出用户连续登录天数、连续签到天数等问题,这篇文章就是用窗口函数比较巧妙的解决这个问题的。文章比较长,建议先收藏后观看+_+ 首先是数据处理 如下代...

osc_c1kk059r
05/15
15
0

没有更多内容

加载失败,请刷新页面

加载更多

使当前提交成为Git存储库中唯一的(初始)提交? - Make the current commit the only (initial) commit in a Git repository?

问题: I currently have a local Git repository, which I push to a Github repository. 我目前有一个本地Git存储库,我将其推送到Github存储库。 The local repository has ~10 commits, ......

javail
14分钟前
14
0
IntelliJ IDEA 默认快捷键大全

Remember these Shortcuts 常用 功能 快捷键 备注 ● Smart code completion Ctrl + Shift + Space - ● Search everywhere Double Shift - ● Show intention actions and quick-fixes Alt......

巨輪
53分钟前
18
0
Hacker News 简讯 2020-07-14

更新时间: 2020-07-14 01:01 I Know What You Download on BitTorrent - (iknowwhatyoudownload.com) 我知道你在BitTorrent上下载了什么 得分:196 | 评论:159 Show HN: Primo – all-in-one......

FalconChen
今天
121
0
绕过移动端系统限制的 dlopen 库 byOpen

byOpen是一个绕过移动端系统限制的增强版dlfunctions库。 支持特性 Android 支持App中加载和使用Android系统库接口(即使maps中还没有被加载也支持)。 Android 7以上dlopen, System.load都是...

shzwork
昨天
31
0
Golang学习系列第二天:变量、常量、数据类型和流程语句

继golang第一天后,今天学习下golang的变量、常量、数据类型和控制流语句。 做过其他编程语言(比如JavaScript,java,python)项目的话,其实很好理解变量、常量、数据类型和控制流。 变量也...

董广明
昨天
48
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部