文档章节

What is a Junk Dimension in Datawarehousing

我是彩笔
 我是彩笔
发布于 2015/04/16 09:38
字数 597
阅读 25
收藏 0

The junk dimension is simply a structure that provides a convenient place to store the junk attributes. It is just a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension.
In OLTP tables that are full of flag fields and yes/no attributes, many of which are used for operational support and have no documentation except for the column names and the memory banks of the person who created them. Not only do those types of attributes not integrate easily into conventional dimensions such as Customer, Vendor, Time, Location, and Product, but you also don’t want to carry bad design into the data warehouse.However, some of the miscellaneous attributes will contain data that has significant business value, so you have to do something with them.

This scenario is especially common in legacy systems and databases that were created without solid, underlying design principles. Column names such as Completed, Packed, Shipped, Received, Delivered, and Returned (each with yes/no data values) are very common, and they do have business value.These miscellaneous indicators and flags that don't logically belong to the core dimension tables.  They are either too valuable to ignore or exclude.Often the meaning of the flags and text attributes is obscure. This situation leaves the designer with a number of bad alternatives
Designers sometimes want to treat them as Fact or make it into numerous small Dimensional tables. However, all of these options are less than ideal. Discarding the data can be dangerous because the miscellaneous values, flags, and yes/no fields might contain valuable business data. Including the miscellaneous attributes in the fact table could cause the fact table to swell to alarming proportions, especially if you have more than just a few miscellaneous attributes. The increased size of the fact table could cause serious performance problems because of the reduced number of records per physical I/O. Even if you tried to index these fields to minimize the performance problems, you still wouldn’t gain anything because so many of the miscellaneous fields contain flag values such as 0 and 1; Y and N; or open, pending, and closed.A third, less obvious but preferable, solution is to incorporate a Junk Dimension as a holding place for these flags and indicators.Advantage of junk dimension:

  • It provides a recognizable location for related codes, indicators and their descriptors in a dimensional framework.

  • This avoids the creation of multiple dimension tables.

  • Provide a smaller, quicker point of entry for queries compared to performance when these attributes are directly in the fact table. 

  • An interesting use for a junk dimension is to capture the context of a specific transaction.  While our common, conformed dimensions contain the key dimensional attributes of interest, there are likely attributes about the transaction that are not known until the transaction is processed.

Above figure shows a junk dimension. As in any dimensional design, each of the rows in the fact table will be associated with a row in this junk dimension.Simple Datawarehouse - Junk DimensionYou want to keep the data warehouse design as simple and straightforward as possible, so that users will be able to access data easily. Miscellaneous attributes that contain business value are a challenge to include in your data warehouse design because they don’t fit neatly into conventional dimensions, and if improperly handled, can cause the data warehouse to swell in size and perform suboptimally. By placing miscellaneous attributes into junk dimensions, you can circumvent both of these problems.


本文转载自:http://dwhlaureate.blogspot.in/2012/08/junk-dimension.html

共有 人打赏支持
我是彩笔
粉丝 7
博文 23
码字总数 1936
作品 0
浦东
EXTMAIL无法将标记为垃圾邮件的邮件自动投递到垃圾邮箱

各位大神,小弟装了个邮件系统, postfix+extmail+dovecot+maildrop+mailscanner+clamav+spamassassin 已经能够正常工作,就是还有一个问题,无法投递垃圾邮件到垃圾邮箱 MailScanner.conf 中...

evilotaku
2017/01/22
382
0
centos 编译安装glibc-2.12.1

在使用daikon建议的kvasir工具时,碰到一些问题,首先是因为该工具比较老,只支持内核最高为2.6的linux系统,其次需要的glibc版本最高为2.11(实在忍不住想吐个槽)。最后我把linux系统换成了...

active_health
2016/06/02
241
0
64位CentOS6编译Android2.2

-bash: ./prebuilt/linux-x86/toolchain/arm-eabi-4.4.0/bin/arm-eabi-gcc: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory 邮件: http://www.redhat.com/archives/r......

shouyong
2012/12/07
0
0
python之commands模块(执行Linux Shell命令)

commands模块 用于执行Linux shell命令,要获得shell命令的输出只需要在后面参数写入('命令')就可以了。 需要得到命令执行的状态则需要判断$?的值, 在Python中有一个模块commands也很容易做到...

东丶
2017/04/16
0
0
7 - VC维度(VC Dimension)-- 衡量模型与样本的复杂度

VC Dimension的定义 我们知道dichotomies数量的上限是成长函数,成长函数的上限是边界函数: 边界函数的上限就是N^(k-1)了: 于是我们得到了上限(成长函数)的上限(边界函数)的上限。。。...

Lee的白板报
2014/03/31
0
0

没有更多内容

加载失败,请刷新页面

加载更多

下一页

JS三元运算示例

1. topFlag=topFlag ==0?1:0; 等于 if(topFlag=00){ topFlag=1; }else if(topFlag == 1){ topFlag=0; } 2. 5>3?alert('5大'):alert('3大'); 即 if(5>3){alert('5大')}else{alert('3大')}; 注......

森火
57分钟前
0
0
利用Slf4j的MDC跟踪方法调用链

why? 一个web项目通常提供很多URL访问地址, 项目一般都是分层处理,例如Controller——>Service——>DAO。 如果想根据日志查看用户一次请求都走了哪些方法(多数是查错误)。 如果系统是多人...

杨春炼
今天
6
0
Maven介绍及安装

Maven介绍及安装 以下内容是本人早期学习时的笔记,可能比较详实繁琐,现在复习一下Maven,顺便将内容抛出来,供大家一起学习进步。 一、Maven简介 Maven是Apache旗下的一款项目管理工具,是...

星汉
今天
0
0
小程序Aes解密

主要步骤: 1、下载AES源码(JS版) 2、在小程序中新建一个公共的文件夹,把AES源码拷贝进去(注意:需要暴露接口 module.exports = CryptoJS;) 3、添加一个用于加密解密的公共JS,可取名为...

Mr_Tea伯奕
今天
0
0
Go实现文件传输(基本传输可用)

发送端 package mainimport ("fmt""os""net""io")func SendFile(path string, connect net.Conn){file, oerr :=os.Open(path)if oerr !=nil{fmt.Println("Open", oerr)......

CHONGCHEN
今天
2
0

没有更多内容

加载失败,请刷新页面

加载更多

下一页

返回顶部
顶部