文档章节

Junk Dimension

我是彩笔
 我是彩笔
发布于 2015/04/16 09:39
字数 336
阅读 23
收藏 0

In data warehouse design, frequently we run into a situation where there are yes/no indicator fields in the source system. Through business analysis, we know it is necessary to keep such information in the fact table. However, if keep all those indicator fields in the fact table, not only do we need to build many small dimension tables, but the amount of information stored in the fact table also increases tremendously, leading to possible performance and management issues.

Junk dimension is the way to solve this problem. In a junk dimension, we combine these indicator fields into a single dimension. This way, we'll only need to build a single dimension table, and the number of fields in the fact table, as well as the size of the fact table, can be decreased. The content in the junk dimension table is the combination of all possible values of the individual indicator fields.

Let's look at an example. Assuming that we have the following fact table:

Fact Table Before Junk Dimension

In this example, TXN_CODE, COUPON_IND, and PREPAY_IND are all indicator fields. In this existing format, each one of them is a dimension. Using the junk dimension principle, we can combine them into a single junk dimension, resulting in the following fact table:

Fact Table With Junk Dimension

Note that now the number of dimensions in the fact table went from 7 to 5.

The content of the junk dimension table would look like the following:

Junk Dimension Example

In this case, we have 3 possible values for the TXN_CODE field, 2 possible values for the COUPON_IND field, and 2 possible values for the PREPAY_IND field. This results in a total of 3 x 2 x 2 = 12 rows for the junk dimension table.

By using a junk dimension to replace the 3 indicator fields, we have decreased the number of dimensions by 2 and also decreased the number of fields in the fact table by 2. This will result in a data warehousing environment that offer better performance as well as being easier to manage.


本文转载自:http://www.1keydata.com/datawarehousing/junk-dimension.html

共有 人打赏支持
我是彩笔
粉丝 7
博文 23
码字总数 1936
作品 0
浦东
What is a Junk Dimension in Datawarehousing

The junk dimension is simply a structure that provides a convenient place to store the junk attributes. It is just a collection of random transactional codes, flags and/or text ......

我是彩笔
2015/04/16
0
0
EXTMAIL无法将标记为垃圾邮件的邮件自动投递到垃圾邮箱

各位大神,小弟装了个邮件系统, postfix+extmail+dovecot+maildrop+mailscanner+clamav+spamassassin 已经能够正常工作,就是还有一个问题,无法投递垃圾邮件到垃圾邮箱 MailScanner.conf 中...

evilotaku
2017/01/22
382
0
64位CentOS6编译Android2.2

-bash: ./prebuilt/linux-x86/toolchain/arm-eabi-4.4.0/bin/arm-eabi-gcc: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory 邮件: http://www.redhat.com/archives/r......

shouyong
2012/12/07
0
0
centos 编译安装glibc-2.12.1

在使用daikon建议的kvasir工具时,碰到一些问题,首先是因为该工具比较老,只支持内核最高为2.6的linux系统,其次需要的glibc版本最高为2.11(实在忍不住想吐个槽)。最后我把linux系统换成了...

active_health
2016/06/02
241
0
7 - VC维度(VC Dimension)-- 衡量模型与样本的复杂度

VC Dimension的定义 我们知道dichotomies数量的上限是成长函数,成长函数的上限是边界函数: 边界函数的上限就是N^(k-1)了: 于是我们得到了上限(成长函数)的上限(边界函数)的上限。。。...

Lee的白板报
2014/03/31
0
0

没有更多内容

加载失败,请刷新页面

加载更多

MySQL面试题集锦

什么是数据库索引?索引有哪几种类型?什么是最左前缀原则?索引算法有哪些?有什么区别? 索引是对数据库表中一列或多列的值进行排序的一种结构。一个非常恰当的比喻就是书的目录页与书的正...

老道士
20分钟前
0
0
使用 LogStash 归集日志

elastic 官网: https://www.elastic.co/ 为了便于集中查看多台主机的业务日志,使用 Filebeat, Redis, Logstash的方式进行收集: (1) Filebeat 监控日志文件的变化, 将新增部分写入redis中, 每...

ouhoo
24分钟前
0
0
java序列化(六) - protostuff序列化

添加依赖 <dependency> <groupId>io.protostuff</groupId> <artifactId>protostuff-core</artifactId> <version>1.5.9</version> </de......

晨猫
30分钟前
0
0
Ceph学习笔记1-Mimic版本多节点部署

特别说明: 本方法也可以用于单节点部署,只部署一个Monitor(只是会形成单点故障而已),最低要求是使用两个分区创建2个OSD(因为默认最小副本是2);如果不需要使用CephFS,则可以不部署M...

LastRitter
33分钟前
0
0
923. 3Sum With Multiplicity - LeetCode

Question 923. 3Sum With Multiplicity Solution 题目大意: 给一个int数组A和一个目标值target,求满足下面两个条件的组合个数,其中i,j,k分别为数组的索引 i<j<k target = A[i] + A[j] + A[k...

yysue
34分钟前
0
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部