Slowly changing dimensions are a fundamental part of any data warehouse, and they can present challenges for data engineers. In this blog post, we’ll explore four types of slowly changing dimensions and how to handle them in your ETL process.
What Are Slowly Changing Dimensions “SCD”
A slowly changing dimension (or SCD) is a type of dimension that changes over time. The key difference between an SCD and a regular dimension is that the structure of an SCD can change and therefore it is not static.
An example of this would be an employee’s job title at your company, which may change over time as they move around or upgrade to a higher management position.
If you wanted to store their current job title in your data warehouse, you would need a model to handle this variability over time.
SCDs are sometimes referred to as “versioned attributes” because they contain different versions of themselves at different times.
Each version contains both its own values as well as all previous versions, allowing users to view historical states for each attribute value independently if needed.
SCD Types
This post will discuss five types of slowly changing dimensions.
Type 0 – Does not overwrite existing values and simply ignores any changes to the original value.
Type 1 – Overwrites any changes to the existing values.
Type 2 – Adds a history record for each value.
Type 3 – Adds history as a separate column.
Type 4 – Adds a new dimension.
Let’s take a closer look at each type of slowly changing dimension.
SCD Type 0
As a starting point, we will focus on Type 0. In the data warehouse, these attributes will not be updated and will not change over time. For SCD Type 0, the record will not be subject to future change.
Two examples are the date of birth or the start date of employment. Therefore, we do not expect the data warehouse to change this dimension.
The disadvantage of type 0 is that historical data is not stored.
SCD Type 1
The SCD type 1 approach overwrites existing data with new data when historical data is not needed in the dimension table. This method is used to correct data errors in the dimension.
In the above figure, Kylo’s role has changed from BI Developer to Data Engineer. As a result of applying SCD Type 1, his position on the dimension table has been dynamically updated.
It might be helpful to use SCD Type 1 utilization in cases where we don’t need to analyze changes to the data. As a result, we are not required to know the history of the changes.
Applying type 1 might also involve correcting mistakes without regard to previous records.
SCD Type 2
So what is used when we don’t want to lose our history? Well, you can use the slowly changing dimension type 2. This allows us to keep track of changes over time without adding additional columns to the database tables.
Every time there is a change in the source system, a new row will be added to the data warehouse table. This row contains both the old and updated values for the column.
Type 2 SCDs have a dimension table that stores the entire history of data collected over time.
Type 2 data can be stored in three different ways.
Versioning: Keep track of changes with a version number.
Flagging: Keep track of changes with a flag.
Effective Date: Keep track of changes by starting with the start date and ending with the end date.
SDC Type 2 Disadvantages:
Keeping the entire history considerably increases the table’s size.
The performance and storage of a table may become a concern when it contains a large number of rows.
There are some challenges associated with implementing Type 2 in the ETL process.
The above SCD Type 2 example illustrates the application of flagging and effective date.
A new record will be inserted into the dimension table if Kylo’s position changes. Old records will be updated with flag value N, and the most recent record will have flag value Y.
As part of the Effective Date method, the EffectiveDate and EndDate columns in the dimension table are used to track the period of the change.
A NULL in the EndDate indicates that the current version of the data is available, while the remaining records indicate that the historical data is available.
SCD Type 3
In Type 3 Slowly Changing Dimension, two columns indicate the attribute of interest, one for the previous value and one for the current value. A column will also specify when the current value becomes active.
Disadvantages
It is not possible to keep an attribute in Type 3 if it has been changed more than once. For example, Kylo’s “BI developer” information will be lost if he is promoted to another role.
This example illustrates how the current status and effective date are applied to SCD Type 3.
Kylo’s Previous position, CurrentPosition, and EffectiveDate columns in the dimension tables are updated when his role changes from BI Developer to Data Engineer.
As a result, the type 3 method will only have a limited history, depending on the number of columns you create.
SCD Type 4
In SCD Type 4, the historical data is stored in a separate table. This is sometimes referred to as a historical table.
A dimension table is used to maintain the current data, while a history table is used to keep track of the history.
The main advantage of this approach is that it gives you far more flexibility in choosing how you want to handle updates to your dimension tables. Additionally, it enhances the performance of the report generation.
Type 4 can be helpful if you have large volumes of data because it reduces the time needed for updating the dimension tables.
Continuing our example with Kylo and his role change in the team:
Kylo’s new role as Data Engineer and EffectiveDate would be reflected in the dimension table in the data warehouse.
Kylo’s previous roles within the team and his EffectiveDate and EndDate would be contained in a separate historical table.
As a Final Note
SCDs can initially seem confusing, especially when starting with them.
Understanding the different types of SCD and how to apply them to your data warehouse will make it easier to begin working with slowly changing dimensions.
Having learned about SCDs, you can explain to interviewers at your next job interview why they are an essential topic and should be kept in mind.
We’ve covered all the basics about SCDs in this post, so now you’re ready to start using them!
Thanks for taking the time to read this post about SCD Types! I hope it has helped you achieve your goal of learning more about them!