Snowflake schema

The snowflake schema is a continuation of the star schema, which is used in OLAP and data warehousing.

In a star schema, the dimension tables are denormalized before, resulting in a better processing speed at the expense of data integrity and storage space with it. In contrast, the individual dimension tables are changed from the snowflake schema, by classifying or normalized. Through this further branching of the data model creates the shape of a snowflake, hence the name of this design pattern is derived.

Due to this finer structuring the data are less than in a redundant star schema, but may be for additional visits to join operations necessary. So a snowflake schema results in smaller and more structured data sets, but have complex relationships and thus may lead to longer loading times or query.

Definition

The snowflake schema is a continuation of the star schema. In this, the fact table is analogous to the star schema. However, the dimension tables are differierend in contrast to the star schema, as they will not all dimension members, but only data on the dimension hierarchies. The dimensions are for refined by being classified or normalized. In any case, the dimension tables are extended to include the attributes so that each instance of a dimension in a separate table can be displayed. That the common snowflake schema data in the dimension tables of the third normal form ( 3NF ) is stored. By the normalization result is a separate table for each level of the hierarchy of a dimension and therefore results in smaller and more structured data. Through this further branching of the data model creates the shape of a snowflake, which gives the name to this scheme.

Properties

  • Dimension tables Primary key for the identification of the dimension values
  • Figure the dimension hierarchy by foreign key
  • Normalization
  • Fact tables ( with the same star schema ) Foreign keys to the dimension tables, that is, the lowest level of each dimension is taken as a key in the fact table
  • Foreign key to the dimensions form composite primary key for the facts

Pros and Cons

Following the advantages and disadvantages of the snowflake schema are shown in comparison to the simple star schema:

Benefits

  • Lower memory consumption: Dimension tables by normalization no redundant data.
  • N: m relationships between aggregation levels can be resolved on relation tables
  • Optimal support for aggregation formation
  • Browsing Functionality: frequent queries over very large dimension tables provide time savings and speed advantage.

Disadvantages

  • Speed ​​disadvantage: Due to additional composites in the dimension tables
  • More complex structuring: Due to the finer structuring the data are less redundant than in a star schema, but the relationships are complex. Multi-level dimension tables must therefore be linked through join queries, and may also result in longer query times under certain circumstances.
  • Larger table number: Through the complex structuring a greater number is required on tables.
  • Reorganization problem: changes in the semantic model lead to extensive reorganization of tables and, consequently, to a higher maintenance

Star schema vs.. Snowflake schema ( normalized)

  • User-friendly query (aggregate access; simple, intuitive data model)
  • Minimize redundancy by normalizing
  • Efficient transaction processing
  • Simple, local and standardized data model
  • A fact table and a few dimension tables
  • Complex and specific schema
  • Many entities and relationships in large data models

Example

The example shows the linked tables, which are necessary for a complete description of the product dimension in Microsoft Data Warehouse sample project AdventureWorks. Category and subcategory of the product in the Product dimension must therefore be included. This information is not directly in the main table for the Product dimension, but a foreign key relationship between product dimension and Product Subcategory dimension, which in turn has a foreign key relationship to the product category table, allows you to include the information for product categories and subcategories in the dimension table of the product.

The number of joins that used increases the snowflake schema as opposed to the star schema linearly with the number of aggregation paths.

716126
de