文档章节

What is a Junk Dimension in Datawarehousing

我是彩笔
 我是彩笔
发布于 2015/04/16 09:38
字数 597
阅读 25
收藏 0

The junk dimension is simply a structure that provides a convenient place to store the junk attributes. It is just a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension.
In OLTP tables that are full of flag fields and yes/no attributes, many of which are used for operational support and have no documentation except for the column names and the memory banks of the person who created them. Not only do those types of attributes not integrate easily into conventional dimensions such as Customer, Vendor, Time, Location, and Product, but you also don’t want to carry bad design into the data warehouse.However, some of the miscellaneous attributes will contain data that has significant business value, so you have to do something with them.

This scenario is especially common in legacy systems and databases that were created without solid, underlying design principles. Column names such as Completed, Packed, Shipped, Received, Delivered, and Returned (each with yes/no data values) are very common, and they do have business value.These miscellaneous indicators and flags that don't logically belong to the core dimension tables.  They are either too valuable to ignore or exclude.Often the meaning of the flags and text attributes is obscure. This situation leaves the designer with a number of bad alternatives
Designers sometimes want to treat them as Fact or make it into numerous small Dimensional tables. However, all of these options are less than ideal. Discarding the data can be dangerous because the miscellaneous values, flags, and yes/no fields might contain valuable business data. Including the miscellaneous attributes in the fact table could cause the fact table to swell to alarming proportions, especially if you have more than just a few miscellaneous attributes. The increased size of the fact table could cause serious performance problems because of the reduced number of records per physical I/O. Even if you tried to index these fields to minimize the performance problems, you still wouldn’t gain anything because so many of the miscellaneous fields contain flag values such as 0 and 1; Y and N; or open, pending, and closed.A third, less obvious but preferable, solution is to incorporate a Junk Dimension as a holding place for these flags and indicators.Advantage of junk dimension:

  • It provides a recognizable location for related codes, indicators and their descriptors in a dimensional framework.

  • This avoids the creation of multiple dimension tables.

  • Provide a smaller, quicker point of entry for queries compared to performance when these attributes are directly in the fact table. 

  • An interesting use for a junk dimension is to capture the context of a specific transaction.  While our common, conformed dimensions contain the key dimensional attributes of interest, there are likely attributes about the transaction that are not known until the transaction is processed.

Above figure shows a junk dimension. As in any dimensional design, each of the rows in the fact table will be associated with a row in this junk dimension.Simple Datawarehouse - Junk DimensionYou want to keep the data warehouse design as simple and straightforward as possible, so that users will be able to access data easily. Miscellaneous attributes that contain business value are a challenge to include in your data warehouse design because they don’t fit neatly into conventional dimensions, and if improperly handled, can cause the data warehouse to swell in size and perform suboptimally. By placing miscellaneous attributes into junk dimensions, you can circumvent both of these problems.


本文转载自:http://dwhlaureate.blogspot.in/2012/08/junk-dimension.html

共有 人打赏支持
我是彩笔
粉丝 7
博文 23
码字总数 1936
作品 0
浦东
EXTMAIL无法将标记为垃圾邮件的邮件自动投递到垃圾邮箱

各位大神,小弟装了个邮件系统, postfix+extmail+dovecot+maildrop+mailscanner+clamav+spamassassin 已经能够正常工作,就是还有一个问题,无法投递垃圾邮件到垃圾邮箱 MailScanner.conf 中...

evilotaku
2017/01/22
382
0
64位CentOS6编译Android2.2

-bash: ./prebuilt/linux-x86/toolchain/arm-eabi-4.4.0/bin/arm-eabi-gcc: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory 邮件: http://www.redhat.com/archives/r......

shouyong
2012/12/07
0
0
centos 编译安装glibc-2.12.1

在使用daikon建议的kvasir工具时,碰到一些问题,首先是因为该工具比较老,只支持内核最高为2.6的linux系统,其次需要的glibc版本最高为2.11(实在忍不住想吐个槽)。最后我把linux系统换成了...

active_health
2016/06/02
241
0
7 - VC维度(VC Dimension)-- 衡量模型与样本的复杂度

VC Dimension的定义 我们知道dichotomies数量的上限是成长函数,成长函数的上限是边界函数: 边界函数的上限就是N^(k-1)了: 于是我们得到了上限(成长函数)的上限(边界函数)的上限。。。...

Lee的白板报
2014/03/31
0
0
Android获取屏幕高度、状态栏高度、标题栏高度(一)

晒代码前先了解一下Android屏幕区域的划分,如下图(该图引用自此文http://www.iteye.com/topic/828830 ) 1、 屏幕区域的获取 2、应用区域的获取 其中,outRect.top 即是状态栏高度。3、vie...

ForingY
2016/07/05
43
0

没有更多内容

加载失败,请刷新页面

加载更多

HTTP get、post 中请求json与map传参格式

import java.io.IOException;import java.net.URI;import java.net.URISyntaxException;import java.nio.charset.Charset;import java.util.ArrayList;import java.util.List;im......

寒风中的独狼
17分钟前
0
0
IDEA中tomcat启动慢 耗时10分钟

用idea中的tomcat以debug模式启动,会非常的慢,而正常启动没啥问题;原因是debug模式中View Breakpoints断点代码,断点的是jar包,而现在启动由于jar包发生变化,导致启动时一直处于等待中。...

GoodMarver
28分钟前
2
0
Linux学习-10月18(awk)

9.6/9.7 awk 一、awk简介   1. awk是一种编程语言,用于对文本和数据进行处理的   2. 具有强大的文本格式化能力   3. 利用命令awk,可以将一些文本整理成为我们想要的样子   4. 命令awk...

wxy丶
30分钟前
0
0
「ThinkPHP开发者周刊」第3期——官宣:5.1版本发布LTS版本

[ 本周读数 ] 70——还剩70天,PHP5.6年底不再提供支持 PHP5.6的安全支持将于2018年12月31日终止。即两个多月后,使用 PHP 5.6 版本的网站将不再收到安全漏洞或错误更新。 较新的 PHP7.0在今...

流年
38分钟前
1
0
Spring IOC 之 注册 BeanDefinition

获取 Document 对象后,会根据该对象和 Resource 资源对象调用 registerBeanDefinitions() 方法,开始注册 BeanDefinitions 首先调用 createBeanDefinitionDocumentReader() 方法实例化 Bean...

职业搬砖20年
42分钟前
2
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部