文档章节

用SQL做数据分析_1

Master_Li
 Master_Li
发布于 2017/08/05 18:47
字数 1282
阅读 75
收藏 0

题目:给定经纬度区间(经度(10,20),纬度(100,200)),50%以上订单起点在该区间内,定义司机为市区司机,相反,若50%以上订单起点在该区间外则定义司机为郊县司机。在北京市随机取1000个服务分在【90,100】的快车司机,请写出SQL代码查询市区司机和郊县司机的数量和市区与郊县司机占比。

分析:

首先在北京市随机取1000个服务分在【90,100】的快车司机,将结果作为司机表d

select 
	Top 1000 a.*, 
	c.service_score
	from a,c
	where (a.driver_id = c.driver_id) 
		and (c.service_score between 90 and 100) 
		and (a.driver_car_lev in ('1','2','3'))
		and (a.city_id = 1)

将司机表和订单表联合得到司机订单表,一个司机对应多个订单,每个订单对应一个起点经纬度,然后以driver_id分组,同时计算每个订单是属于市区订单还是郊区订单,将结果作为表e

select 
		b.*,
		COUNT(*) as order1,
		SUM(case when (current_lng between 10 and 20) and (current_lan between 100 and 200)  then 1 else 0 end) as order2,
		SUM(case when (current_lng not between 10 and 20) or (current_lan not between 100 and 200)  then 1 else 0 end) as order3
		from (select 
				Top 1000 a.*, 
				c.service_score
				from a,c
				where (a.driver_id = c.driver_id) 
					and (c.service_score between 90 and 100) 
					and (a.driver_car_lev in ('1','2','3'))
					and (a.city_id = 1))as d,b
		where b.driver_id = d.driver_id
		group by b.driver_id,b.order_id,b.current_lng,b.current_lan

对于表e,以司机为单位,计算每个司机的总订单数,市区订单数,市区订单比例,郊区订单数,郊区订单比例。将每个司机的处理结果作为表f

select 
	driver_id,
	COUNT(order1) as tlnum,
	SUM(order2) as sqnum,
	SUM(order2)*1.0/COUNT(order1) as sqrate,
	SUM(order3) as jqnum,
	SUM(order3)*1.0/COUNT(order1) as jqrate
	from 
	(select 
		b.*,
		COUNT(*) as order1,
		SUM(case when (current_lng between 10 and 20) and (current_lan between 100 and 200)  then 1 else 0 end) as order2,
		SUM(case when (current_lng not between 10 and 20) or (current_lan not between 100 and 200)  then 1 else 0 end) as order3
		from (select 
				Top 1000 a.*, 
				c.service_score
				from a,c
				where (a.driver_id = c.driver_id) 
					and (c.service_score between 90 and 100) 
					and (a.driver_car_lev in ('1','2','3'))
					and (a.city_id = 1))as d,b
		where b.driver_id = d.driver_id
		group by b.driver_id,b.order_id,b.current_lng,b.current_lan) as e
	group by driver_id 	

最后,只要对表f做一个统计和判断,即可得到市区司机和郊县司机的数量和市区与郊县司机占比

select 
	COUNT(*) as tldriver,
	SUM(case when (sqrate > 0.5) then 1 else 0 end) as sqdriver,
	SUM(case when (sqrate > 0.5) then 1 else 0 end)*1.0/COUNT(*) as sqdeiverrate,
	SUM(case when (jqrate > 0.5) then 1 else 0 end) as jqdriver,
	SUM(case when (jqrate > 0.5) then 1 else 0 end)*1.0/COUNT(*) as jqdeiverrate
	from
	(select 
		driver_id,
		COUNT(order1) as tlnum,
		SUM(order2) as sqnum,
		SUM(order2)*1.0/COUNT(order1) as sqrate,
		SUM(order3) as jqnum,
		SUM(order3)*1.0/COUNT(order1) as jqrate
		from 
			(select 
				b.*,
				COUNT(*) as order1,
				SUM(case when (current_lng between 10 and 20) and (current_lan between 100 and 200)  then 1 else 0 end) as order2,
				SUM(case when (current_lng not between 10 and 20) or (current_lan not between 100 and 200)  then 1 else 0 end) as order3
				from (select 
						Top 1000 a.*, 
						c.service_score
						from a,c
						where (a.driver_id = c.driver_id) 
							and (c.service_score between 90 and 100) 
							and (a.driver_car_lev in ('1','2','3'))
							and (a.city_id = 1))as d,b
				where b.driver_id = d.driver_id
				group by b.driver_id,b.order_id,b.current_lng,b.current_lan) as e
		group by driver_id) as f 	

答案:

为了测试SQL的正确性,以SQL Server数据库建表,使用C#程序批量填入数据

①使用对象资源管理器创建表a,b,c

②使用C#批量插入数据

using System;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace DD
{
    class Program
    {
     
        static void Main(string[] args)
        {
            String CONNECTION_STRING = "server = .;database=DD;Integrated Security = true;";

            SqlConnection conn = new SqlConnection(CONNECTION_STRING);
            conn.Open();
            
            SqlCommand cmd = new SqlCommand();
            cmd.Connection = conn;
            cmd.CommandType = CommandType.Text;


            for (int i = 1; i <=1000; i++)
            {
                string sql =
                       "insert into a(driver_id, city_id, driver_car_lev) values (" + Convert.ToString(i) + "," + Convert.ToString(1) + "," + Convert.ToString(1) + ")";
                cmd.CommandText = sql;
                cmd.ExecuteNonQuery();
            }


            for(int i =1; i<=600; i++){
                for (int j = 1; j <= 3; j++)
                {
                    string sql =
                       "insert into b(driver_id, order_id, current_lng, current_lan) values (" + Convert.ToString(i) + "," + Convert.ToString(j) + "," + Convert.ToString(new Random().Next(10) + 10) + "," + Convert.ToString(new Random().Next(100) + 100) + ")";
                    cmd.CommandText = sql;
                    cmd.ExecuteNonQuery();
                }
                for (int j = 4; j <= 4; j++)
                {
                    string sql =
                       "insert into b(driver_id, order_id, current_lng, current_lan) values (" + Convert.ToString(i) + "," + Convert.ToString(j) + "," + Convert.ToString(new Random().Next(10) + 30) + "," + Convert.ToString(new Random().Next(100) + 300) + ")";
                    cmd.CommandText = sql;
                    cmd.ExecuteNonQuery();
                }
            }
            for (int i = 601; i <= 1000; i++)
            {
                for (int j = 1; j <= 3; j++)
                {
                    string sql =
                      "insert into b(driver_id, order_id, current_lng, current_lan) values (" + Convert.ToString(i) + "," + Convert.ToString(j) + "," + Convert.ToString(new Random().Next(10) + 30) + "," + Convert.ToString(new Random().Next(100) + 300) + ")";
                    cmd.CommandText = sql;
                    cmd.ExecuteNonQuery();
                }
                for (int j = 4; j <= 4; j++)
                {
                    string sql =
                       "insert into b(driver_id, order_id, current_lng, current_lan) values (" + Convert.ToString(i) + "," + Convert.ToString(j) + "," + Convert.ToString(new Random().Next(10) + 10) + "," + Convert.ToString(new Random().Next(100) + 100) + ")";
                    cmd.CommandText = sql;
                    cmd.ExecuteNonQuery();
                }
            }

            for (int i = 1; i <= 1000; i++)
            {
                string sql =
                       "insert into c(driver_id, service_score) values (" + Convert.ToString(i) + "," + Convert.ToString(new Random().Next(10) + 90) + ")";
                cmd.CommandText = sql;
                cmd.ExecuteNonQuery();
            }


            //SqlDataReader reader = cmd.ExecuteReader();
            //System.Console.Write("column1\tcolumn2\tcolumn3\n");
            //while (reader.Read())
            //{
            //    System.Console.Write(reader["driver_id"]);
            //    System.Console.Write(reader["city_id"]);
            //    System.Console.Write(reader["driver_car_lev"]);
            //}
            conn.Close();
            System.Console.Write("Insert complete!");
            Console.ReadKey(true);//暂停  

        }
    }
}

③测试SQL脚本

select 
	COUNT(*) as tldriver,
	SUM(case when (sqrate > 0.5) then 1 else 0 end) as sqdriver,
	SUM(case when (sqrate > 0.5) then 1 else 0 end)*1.0/COUNT(*) as sqdeiverrate,
	SUM(case when (jqrate > 0.5) then 1 else 0 end) as jqdriver,
	SUM(case when (jqrate > 0.5) then 1 else 0 end)*1.0/COUNT(*) as jqdeiverrate
	from
	(select 
		driver_id,
		COUNT(order1) as tlnum,
		SUM(order2) as sqnum,
		SUM(order2)*1.0/COUNT(order1) as sqrate,
		SUM(order3) as jqnum,
		SUM(order3)*1.0/COUNT(order1) as jqrate
		from 
			(select 
				b.*,
				COUNT(*) as order1,
				SUM(case when (current_lng between 10 and 20) and (current_lan between 100 and 200)  then 1 else 0 end) as order2,
				SUM(case when (current_lng not between 10 and 20) or (current_lan not between 100 and 200)  then 1 else 0 end) as order3
				from (select 
						Top 1000 a.*, 
						c.service_score
						from a,c
						where (a.driver_id = c.driver_id) 
							and (c.service_score between 90 and 100) 
							and (a.driver_car_lev in ('1','2','3'))
							and (a.city_id = 1))as d,b
				where b.driver_id = d.driver_id
				group by b.driver_id,b.order_id,b.current_lng,b.current_lan) as e
		group by driver_id) as f 	

结果:表d

表e

表f

最终结果

 

Over!!!

© 著作权归作者所有

共有 人打赏支持
Master_Li
粉丝 1
博文 74
码字总数 57794
作品 0
南京
程序员
加载中

评论(2)

Master_Li
Master_Li

引用来自“lijiachen56”的评论

您好大神,想请教一下表f
SUM(order2)*1.0/COUNT(order1) as sqrate
为什么要*1.0呢?
因为不确定sum()和count()返回的类型,所以为了保证结果是浮点类型,*1.0
l
lijiachen56
您好大神,想请教一下表f
SUM(order2)*1.0/COUNT(order1) as sqrate
为什么要*1.0呢?
加米谷:数据分析师是做什么的,需要掌握哪些技能?

数据分析师是做什么的: 1、支持各种常规或临时数据分析需求; 2、提供各类业务相关的分析及建议; 3、通过建模深入挖掘用户或产品方面的有价值的信息; 4、和各部门沟通协调需求并提出各种新...

加米谷大数据
07/13
0
0
大数据分析师是做什么的?需要掌握哪些技能?

数据分析师是做什么的: 1、支持各种常规或临时数据分析需求; 2、提供各类业务相关的分析及建议; 3、通过建模深入挖掘用户或产品方面的有价值的信息; 4、和各部门沟通协调需求并提出各种新...

加米谷
07/13
0
0
mysql多表查询实例结果及分析-2017.12.14

今天帮运营导数据时发现sql执行速度非常慢,嗯,按照之前的编写习惯,后来跟架构师一起分析讨论,改进了下,现记录分析如下 一、测试表、sql情况 set @bgDate = '20171213';set @edDate = '...

Ye_Kwii
2017/12/14
0
0
我是如何入门、成长并进阶为数据分析师的?

前几天和一朋友A聊天,一个在我看来完全可以在大部分行业领域公司独当一面的数据分析师,目前仍谦虚的以“数据猿”自称。从事数据分析的他,半路出家,起初虽然懂一些数据库的知识,但仍然和...

雄霸天下啦
2017/06/22
0
0
实战演练:洞若观火--治堵之道在清源

1 观察:把问题说清楚 在我印象中,DBA 都是很明事理的人,因为这个职业经常要干的一件事情就是:“把问题说清楚,把规矩讲明白”。这是一项软技能,不管是问题分析过程中与同行沟通,还是最...

技术小能手
07/09
0
0

没有更多内容

加载失败,请刷新页面

加载更多

面向对象设计原则(OOP)

单一职责原则(Single responsibility principle)又称单一功能原则。它规定一个类应该只有一个发生变化的原因。 核心原则:低耦合,高内聚。 一个类,应该只有一个引起它变化的原因,也就是...

gackey
32分钟前
2
0
C++ 锁

C++提供了两种常用的锁,std::lock_guard<Lockable &T>和std::unique_lock<Lockable &T>。通常使用场景下,这两个锁用法一致。即,在构造锁对象时上锁,在析构锁对象时解锁。使用户从上锁/解...

yepanl
36分钟前
2
0
Kali Linux Docker 練習

docker pull kalilinux/kali-linux-docker docker run -t -i kalilinux/kali-linux-docker /bin/bash apt-get update apt-get install htop apt-get install nmap apt-get install wpscan ap......

BaiyuanLab
今天
2
0
通俗大白话来理解TCP协议的三次握手和四次分手

最近在恶补计算机网络方面的知识,之前对于TCP的三次握手和四次分手也是模模糊糊,对于其中的细节更是浑然不知,最近看了很多这方面的知识,也在系统的学习计算机网络,加深自己的CS功底,就...

onedotdot
今天
2
0
TiDB 在爱奇艺的应用及实践

爱奇艺,中国高品质视频娱乐服务提供者,2010 年 4 月 22 日正式上线,推崇品质、青春、时尚的品牌内涵如今已深入人心,网罗了全球广大的年轻用户群体,积极推动产品、技术、内容、营销等全方...

TiDB
今天
1
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部