文档章节

Junk Dimension

我是彩笔
 我是彩笔
发布于 2015/04/16 09:39
字数 336
阅读 23
收藏 0

In data warehouse design, frequently we run into a situation where there are yes/no indicator fields in the source system. Through business analysis, we know it is necessary to keep such information in the fact table. However, if keep all those indicator fields in the fact table, not only do we need to build many small dimension tables, but the amount of information stored in the fact table also increases tremendously, leading to possible performance and management issues.

Junk dimension is the way to solve this problem. In a junk dimension, we combine these indicator fields into a single dimension. This way, we'll only need to build a single dimension table, and the number of fields in the fact table, as well as the size of the fact table, can be decreased. The content in the junk dimension table is the combination of all possible values of the individual indicator fields.

Let's look at an example. Assuming that we have the following fact table:

Fact Table Before Junk Dimension

In this example, TXN_CODE, COUPON_IND, and PREPAY_IND are all indicator fields. In this existing format, each one of them is a dimension. Using the junk dimension principle, we can combine them into a single junk dimension, resulting in the following fact table:

Fact Table With Junk Dimension

Note that now the number of dimensions in the fact table went from 7 to 5.

The content of the junk dimension table would look like the following:

Junk Dimension Example

In this case, we have 3 possible values for the TXN_CODE field, 2 possible values for the COUPON_IND field, and 2 possible values for the PREPAY_IND field. This results in a total of 3 x 2 x 2 = 12 rows for the junk dimension table.

By using a junk dimension to replace the 3 indicator fields, we have decreased the number of dimensions by 2 and also decreased the number of fields in the fact table by 2. This will result in a data warehousing environment that offer better performance as well as being easier to manage.


本文转载自:http://www.1keydata.com/datawarehousing/junk-dimension.html

共有 人打赏支持
我是彩笔
粉丝 7
博文 23
码字总数 1936
作品 0
浦东
私信 提问
What is a Junk Dimension in Datawarehousing

The junk dimension is simply a structure that provides a convenient place to store the junk attributes. It is just a collection of random transactional codes, flags and/or text ......

我是彩笔
2015/04/16
0
0
64位CentOS6编译Android2.2

-bash: ./prebuilt/linux-x86/toolchain/arm-eabi-4.4.0/bin/arm-eabi-gcc: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory 邮件: http://www.redhat.com/archives/r......

shouyong
2012/12/07
0
0
centos 编译安装glibc-2.12.1

在使用daikon建议的kvasir工具时,碰到一些问题,首先是因为该工具比较老,只支持内核最高为2.6的linux系统,其次需要的glibc版本最高为2.11(实在忍不住想吐个槽)。最后我把linux系统换成了...

active_health
2016/06/02
241
0
7 - VC维度(VC Dimension)-- 衡量模型与样本的复杂度

VC Dimension的定义 我们知道dichotomies数量的上限是成长函数,成长函数的上限是边界函数: 边界函数的上限就是N^(k-1)了: 于是我们得到了上限(成长函数)的上限(边界函数)的上限。。。...

Lee的白板报
2014/03/31
0
0
Android获取屏幕高度、状态栏高度、标题栏高度(一)

晒代码前先了解一下Android屏幕区域的划分,如下图(该图引用自此文http://www.iteye.com/topic/828830 ) 1、 屏幕区域的获取 2、应用区域的获取 其中,outRect.top 即是状态栏高度。3、vie...

ForingY
2016/07/05
43
0

没有更多内容

加载失败,请刷新页面

加载更多

大数据反欺诈技术架构

一年多以前,有朋友让我聊一下你们的大数据反欺诈架构是怎么实现的,以及我们途中踩了哪些坑,怎么做到从30min延迟优化到1s内完成实时反欺诈。当时呢第一是觉得不合适,第二也是觉得场景比较...

微笑向暖wx
21分钟前
0
0
flink-系统内部消息传递的exactly once语义

At Most once,At Least once和Exactly once 在分布式系统中,组成系统的各个计算机是独立的。这些计算机有可能fail。 一个sender发送一条message到receiver。根据receiver出现fail时sender如...

xtof
28分钟前
0
0
iOS程序执行顺序和UIViewController 的生命周期(整理)

说明:此文是自己的总结笔记,主要参考: iOS程序的启动执行顺序 AppDelegate 及 UIViewController 的生命周期 UIView的生命周期 言叶之庭.jpeg 一. iOS程序的启动执行顺序 程序启动顺序图 iO...

壹峰
30分钟前
0
0
配置网络、远程登录、Linux秘钥认证

配置网络 一台服务器安装完系统之后不管是为了方便管理还是业务需要,我们都要给它配置ip地址。让机器能够联网。在现实的生产环境的当中,往往我们给服务器配置的ip都是提前规划好的,但是在...

李超小牛子
33分钟前
0
0
dotConnect for Oracle入门指南(五):检索和修改数据

【下载dotConnect for Oracle最新版本】 dotConnect for Oracle(原名OraDirect.NET)建立在ADO.NET技术上,为基于Oracle数据库的应用程序提供完整的解决方案。它为设计应用程序结构带来了新的...

电池盒
33分钟前
0
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部