Dimensions are Not Entities, or Why Data Structures Aren’t Equal

来源:百度文库 编辑:神马文学网 时间:2024/04/27 09:17:33
November 2004

Dimensions are Not Entities, or Why Data Structures Aren’t Equal
It takes some mental adjusting to adapt to data warehousing methods of data structure design. Both first-time data warehousers and experienced folks, who have kept their noses a little too close to that proverbial grindstone, can run into problems when attempting a query against a lone dimension table. These problems occur because a dimension is not the same thing as an entity. The previous statement is true even if the dimension is implemented as a table in a relational database. A dimension is simply not the same thing as a 3rd normal(-ish) form relational table. When an unsuspecting individual attempts to query a dimension table as if it were just another table, more often than not the querier will jump to the conclusion that something is wrong -- because the answers do not fit the expectations.
A normalized table includes data that is functionally dependent on the key of the table. Under this dependency, data is fairly narrow in scope. Alternately, a dimension‘s scope could be fairly broad. Superficially, the design of a dimension table should, after a fashion, place everything into a functional dependence on the dimension‘s generated primary key. But while confusingly close, these two concepts (normalized functional dependence vs. generated dimensional dependence) are not the same thing. In the normalized table, included data functionally depends on the identity of an occurrence of the object described by that table. Therefore, contained data should be related closely to that single kind of object regardless of whether it is people, animals, or postage stamps. The dimension table is different; its primary key is jerry-rigged, specifically so that the key fits the needs of the data. Data items may be closely related to a single object; or a range of hierarchies in the ballpark of the object; or even varying groups of otherwise unrelated items that work more conveniently for some kind of query when mashed together. As an example, a "customer" dimension could contain direct customer-dependent data items; or it could contain loosely related data items under a broad category referred to as "customer," such as items that might be found in normalized tables for customer, corporation, customer locations, customer representative, etc.; or even sales hierarchy data and shoe size descriptions shoved together with some customer elements designated as a group named "customer." Clearly, the dimension has purposefully put that primary-key-cart right there in front of the dependency-horse, which you may have noticed is indeed a horse of a different color.
On one hand, a normalized customer table would definitely lead one to believe that each row within the table represents a customer; on the other hand, the same cannot be said of a customer dimension. A dimension is designed so that valid answer sets are obtained through grouping or selecting values from inside the columns within the dimensions; there is no reliance on a value existing on a single row within the dimension. Due to the varying needs for managing the dimension content via slowly changing dimension logic, the number of rows found within the dimension table only indicates how often certain kinds of data change their values -- a trivial number, generally of limited interest. In order to emulate a normalized result against these dimensions, queries often must be formatted to return a count of distinct values within individual columns or column groupings, instead of performing the generic row counts usually executed. Therefore, people attempting to query a dimension as if it were no different than a normalized structure observe results that may be misleading. And in being misled, the user will look to the dimensional construction as being in error. All data structures are not created equal. Consider this the next time you need to write a query: look, and evaluate, before you leap. Dimensionalized designs are different, and that difference is not an accident. Multidimensional structures conform to a strict arrangement, albeit not a normalized one.
Todd Schraml is a contributing writer to DBTA and Data Architect at Innovative Health Strategies, a leading business intelligence services provider to pharmaceutical manufacturers through management and measurement of data related to contracting, marketing, and revenue generation strategies. He has many years experience in application development and maintenance; including over a decade focused on data warehousing, and several years experience in database administration on massively parallel processing database management systems. Across many projects and several industries, Todd has been Data Warehouse Architect, Technical Lead, DBA, Business Analyst, Developer, and Teacher. He lives in the vibrant Midwest and can be reached attschraml@ihsiq.com.
2004 Database Elaborations Archives:
A Name Confers Much More Than Mere Identity December 2004Dimensions are Not Entities, or Why Data Structures Aren’t Equal November 2004Anything Worth Doing Is Worth Doing Again October 2004Simplicity is in the Eyes of the Beholder September 2004The Object-Relational Divide Can Be Bridged August 2004Normal is as Normal Does July 2004Column Placement in Tables and Indexes Can Determine Performance June 2004Flexibility Has Its Costs May 2004Managing the Time Dimension Challenges Data Mart Builders April 2004To Avoid Inertia, Learn to Compromise March 2004Build Better Relationships through Data Modeling February 2004Talkin‘ ‘Bout Key G-G-Generation January 2004
2003 Database Elaborations Archives:
Full Table Scans Happen December 2003Better DBMS Domain Support Means Simpler Models November 2003The Process to Populate Fact Tables Must Adjust to Disruptions October 2003Multidimensional Nonloss Decomposition Made Simple September 2003Follow this Path to Painless and Almost Rational XML Structures August 2003You Have Heard It Before, But the Devil is Still in the Details July 2003Domains Require Subtlety June 2003Does the Term “Bureaucracy of Databases” Make You “Stutter”? May 2003Should You Change the Way You Think about Change? April 2003Your Database Models May Change Over Time, and That’s Good March 2003Beware of Developers Bearing Gifts of XML February 2003With Surrogate Keys, Use Caution January 2003
2002 Database Elaborations Archives:
What is a Database? The Answer is Not So Obvious December 2002No No Repeating Repeating Groups Groups November 2002Ambiguity Makes Gathering Data Knowledge a Distinct Challenge October 2002Thoughts of Database Portability Dance in My Head Sepetmber 2002Development Teams Stare Across the Object-Data Divide August 2002How Comprehensive are Your Dimensions? July 2002The Design Implications of a Hierarchy of One June 2002
_xyz