文档章节

What is a Junk Dimension in Datawarehousing

我是彩笔
 我是彩笔
发布于 2015/04/16 09:38
字数 597
阅读 26
收藏 0

The junk dimension is simply a structure that provides a convenient place to store the junk attributes. It is just a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension.
In OLTP tables that are full of flag fields and yes/no attributes, many of which are used for operational support and have no documentation except for the column names and the memory banks of the person who created them. Not only do those types of attributes not integrate easily into conventional dimensions such as Customer, Vendor, Time, Location, and Product, but you also don’t want to carry bad design into the data warehouse.However, some of the miscellaneous attributes will contain data that has significant business value, so you have to do something with them.

This scenario is especially common in legacy systems and databases that were created without solid, underlying design principles. Column names such as Completed, Packed, Shipped, Received, Delivered, and Returned (each with yes/no data values) are very common, and they do have business value.These miscellaneous indicators and flags that don't logically belong to the core dimension tables.  They are either too valuable to ignore or exclude.Often the meaning of the flags and text attributes is obscure. This situation leaves the designer with a number of bad alternatives
Designers sometimes want to treat them as Fact or make it into numerous small Dimensional tables. However, all of these options are less than ideal. Discarding the data can be dangerous because the miscellaneous values, flags, and yes/no fields might contain valuable business data. Including the miscellaneous attributes in the fact table could cause the fact table to swell to alarming proportions, especially if you have more than just a few miscellaneous attributes. The increased size of the fact table could cause serious performance problems because of the reduced number of records per physical I/O. Even if you tried to index these fields to minimize the performance problems, you still wouldn’t gain anything because so many of the miscellaneous fields contain flag values such as 0 and 1; Y and N; or open, pending, and closed.A third, less obvious but preferable, solution is to incorporate a Junk Dimension as a holding place for these flags and indicators.Advantage of junk dimension:

  • It provides a recognizable location for related codes, indicators and their descriptors in a dimensional framework.

  • This avoids the creation of multiple dimension tables.

  • Provide a smaller, quicker point of entry for queries compared to performance when these attributes are directly in the fact table. 

  • An interesting use for a junk dimension is to capture the context of a specific transaction.  While our common, conformed dimensions contain the key dimensional attributes of interest, there are likely attributes about the transaction that are not known until the transaction is processed.

Above figure shows a junk dimension. As in any dimensional design, each of the rows in the fact table will be associated with a row in this junk dimension.Simple Datawarehouse - Junk DimensionYou want to keep the data warehouse design as simple and straightforward as possible, so that users will be able to access data easily. Miscellaneous attributes that contain business value are a challenge to include in your data warehouse design because they don’t fit neatly into conventional dimensions, and if improperly handled, can cause the data warehouse to swell in size and perform suboptimally. By placing miscellaneous attributes into junk dimensions, you can circumvent both of these problems.


本文转载自:http://dwhlaureate.blogspot.in/2012/08/junk-dimension.html

共有 人打赏支持
上一篇: Junk Dimension
下一篇: Factless Fact Table
我是彩笔
粉丝 7
博文 23
码字总数 1936
作品 0
浦东
私信 提问
64位CentOS6编译Android2.2

-bash: ./prebuilt/linux-x86/toolchain/arm-eabi-4.4.0/bin/arm-eabi-gcc: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory 邮件: http://www.redhat.com/archives/r......

shouyong
2012/12/07
0
0
centos 编译安装glibc-2.12.1

在使用daikon建议的kvasir工具时,碰到一些问题,首先是因为该工具比较老,只支持内核最高为2.6的linux系统,其次需要的glibc版本最高为2.11(实在忍不住想吐个槽)。最后我把linux系统换成了...

active_health
2016/06/02
241
0
7 - VC维度(VC Dimension)-- 衡量模型与样本的复杂度

VC Dimension的定义 我们知道dichotomies数量的上限是成长函数,成长函数的上限是边界函数: 边界函数的上限就是N^(k-1)了: 于是我们得到了上限(成长函数)的上限(边界函数)的上限。。。...

Lee的白板报
2014/03/31
0
0
Android获取屏幕高度、状态栏高度、标题栏高度(一)

晒代码前先了解一下Android屏幕区域的划分,如下图(该图引用自此文http://www.iteye.com/topic/828830 ) 1、 屏幕区域的获取 2、应用区域的获取 其中,outRect.top 即是状态栏高度。3、vie...

ForingY
2016/07/05
43
0
indicator数据格式

{ "original_id": "AAAA", "indicator_key": "AAAA", "indicator_text": "指标A", "data_source": "国家统计局", "dimension": [ {"dimension_key": "year", "dimension_text": "年&...

Puffy
2014/11/18
5
0

没有更多内容

加载失败,请刷新页面

加载更多

Linux Wireshark普通用户启动使用方案

当系统安装好Wireshark后请正常启动是否可以进行正常使用,如果不行请参考下列指导 向系统添加一个用户组 sudo groupadd wireshark //如提示此组存在可跳过 将指定用户添加到这个组中 sudo...

CHONGCHEN
7分钟前
0
0
CSS 选择器参考手册

CSS 选择器参考手册 选择器 描述 [attribute] 用于选取带有指定属性的元素。 [attribute=value] 用于选取带有指定属性和值的元素。 [attribute~=value] 用于选取属性值中包含指定词汇的元素。...

Jack088
今天
1
0
数据库篇一

数据库篇 第1章 数据库介绍 1.1 数据库概述  什么是数据库(DB:DataBase) 数据库就是存储数据的仓库,其本质是一个文件系统,数据按照特定的格式将数据存储起来,用户可以对数据库中的数据...

stars永恒
今天
2
0
Intellij IDEA中设置了jsp页面,但是在访问页面时却提示404

在Intellij IDEA中设置了spring boot的jsp页面,但是在访问时,却出现404,Not Found,经过查找资料后解决,步骤如下: 在Run/Debug Configurations面板中设置该程序的Working Directory选项...

uknow8692
昨天
3
0
day24:文档第五行增内容|每月1号压缩/etc/目录|过滤文本重复次数多的10个单词|人员分组|

1、在文本文档1.txt里第五行下面增加如下内容;两个方法; # This is a test file.# Test insert line into this file. 分析:给文档后增加内容,可以用sed 来搞定;也可以用while do done...

芬野de博客
昨天
4
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部