You use dimension tables to hold the descriptive attributes that you want to use to analyse your measures.
This is fine but what do you do when this information changes? There are three common ways that you can handle changes in your dimensional model.
Type 1 – Overwrite
In this method when a change happens you simply overwrite the dimension record with the new values. This is nice and simple however, the old values are gone.
In this example a saleswoman has recorded some sales.
SalesPersonDimension
SalesPersonKey | SalesPersonID | SalesPersonName |
---|---|---|
1 | 000001 | Mary Jones |
2 | 000002 | John Smith |
SalesFactTable
SalesPersonKey | SalesDate | SalesAmount |
---|---|---|
1 | 1-Feb-2008 | £10,000.00 |
2 | 1-Feb-2008 | £9,500.00 |
Of course I would never store a date in a fact table, I would use a foreign key to a date dimension. I am trying to keep the example as concise as possible.
Mary and John have been working very closely and they have decided to get married, accordingly Mary’s name changes.
SalesPersonDimension
SalesPersonKey | SalesPersonID | SalesPersonName |
---|---|---|
1 | 000001 | Mary Smith |
2 | 000002 | John Smith |
SalesFactTable
SalesPersonKey | SalesDate | SalesAmount |
---|---|---|
1 | 1-Feb-2008 | £10,000.00 |
2 | 1-Feb-2008 | £9,500.00 |
1 | 1-Mar-2008 | £11,000.00 |
2 | 1-Mar-2008 | £12,500.00 |
As you can see Mary’s name has been simply overwritten with the new value. However if I look back at February’s sales figures, Mary Smith had sales, but she was actually Mary Jones in February.
When using type 1 dimensions you need to be sure that you don’t mind the values being changed for all time!
Type 2 – Versioning
In this method your requirements state that you must accurately report on the information as it existed at the time. In order to do this, when the data changes you must keep the old version and then create a new version of the record with the updated values. In order to get best use from this I would always add three additional columns to my dimension: ValidFrom, ValidTo and CurrentFlag.
SalesPersonDimension
SalesPersonKey | SalesPersonID | SalesPersonName | ValidFrom | ValidTo | CurrentFlag |
---|---|---|---|---|---|
1 | 000001 | Mary Jones | 1-Jan-2005 | 31-Dec-9999 | Y |
2 | 000002 | John Smith | 1-Mar-2006 | 31-Dec-9999 | Y |
SalesFactTable
SalesPersonKey | SalesDate | SalesAmount |
---|---|---|
1 | 1-Feb-2008 | £10,000.00 |
2 | 1-Feb-2008 | £9,500.00 |
Mary and John have been working very closely and they have decided to get married, accordingly Mary’s name changes.
SalesPersonDimension
SalesPersonKey | SalesPersonID | SalesPersonName | ValidFrom | ValidTo | CurrentFlag |
---|---|---|---|---|---|
1 | 000001 | Mary Jones | 1-Mar-2006 | 1-Mar-2008 | N |
2 | 000002 | John Smith | 1-Mar-2006 | 31-Dec-9999 | Y |
3 | 000001 | Mary Smith | 1-Mar-2008 | 31-Dec-9999 | Y |
SalesFactTable
SalesPersonKey | SalesDate | SalesAmount |
---|---|---|
1 | 1-Feb-2008 | £10,000.00 |
2 | 1-Feb-2008 | £9,500.00 |
3 | 1-Mar-2008 | £11,000.00 |
2 | 1-Mar-2008 | £12,500.00 |
There are a few things to note here:
Dimension record #1 Mary’s name has not been changed but it has been end-dated to mark the period when this name was valid.
Dimension record #3 has been created with her new details and again you can see the period that this record is valid for.
Mary’s sales in the fact table, in February point to dimension #1, Mary Jones and in March point to dimension #3, Mary Smith.
Type 1 & Type 2 – The best of both worlds!
Depending on the attributes you are capturing, some of them may be very dynamic and change constantly, likewise there may be attributes that you do want to track changes in, and some that you do not. This can be catered for by using both of the schemes described above. Rather than considering the whole dimension to be type 1 or type 2, the attrubutes within the dimension will be of type 1 or type 2.
In this method your requirements state that you must accurately report on the information as it existed at the time. In order to do this, when the data changes you must keep the old version and then create a new version of the record with the updated values. In order to get best use from this I would
SalesPersonDimension
Sales Person Key |
Sales Person ID |
Sales Person Name |
Mobile Tel No |
Valid From |
Valid To |
Current Flag |
---|---|---|---|---|---|---|
1 | 000001 | Mary Jones | 07771 123 4567 | 1-Jan-2005 | 31-Dec-9999 | Y |
2 | 000002 | John Smith | 07772 321 7654 | 1-Mar-2006 | 31-Dec-9999 | Y |
SalesFactTable
SalesPersonKey | SalesDate | SalesAmount |
---|---|---|
1 | 1-Feb-2008 | £10,000.00 |
2 | 1-Feb-2008 | £9,500.00 |
Mary and John have been working very closely and they have decided to get married, accordingly Mary’s name changes. At the same time John gets a new mobile phone and his number changes.
SalesPersonDimension
Sales Person Key |
Sales Person ID |
Sales Person Name |
Mobile Tel No |
Valid From |
Valid To |
Current Flag |
---|---|---|---|---|---|---|
1 | 000001 | Mary Jones | 07771 123 4567 | 1-Mar-2006 | 1-Mar-2008 | N |
2 | 000002 | John Smith | 07775 987 2323 | 1-Mar-2006 | 31-Dec-9999 | Y |
3 | 000001 | Mary Smith | 07771 123 4567 | 1-Mar-2008 | 31-Dec-9999 | Y |
SalesFactTable
SalesPersonKey | SalesDate | SalesAmount |
---|---|---|
1 | 1-Feb-2008 | £10,000.00 |
2 | 1-Feb-2008 | £9,500.00 |
3 | 1-Mar-2008 | £11,000.00 |
2 | 1-Mar-2008 | £12,500.00 |
There are a few things to note here:
Mary’s changes have been recorded and tracked as it is important that we can see when the values changed.
The update to John’s mobile telephone number has simply been done “in place”, we don’t care what his number was last month, we just need to know what it’s current value is.
This is the most common implementation; with a dimension table holding a mixture of type 1 and type 2 attributes.
Type 3 – “Soft” or “Limited” Versioning
With type 2 changes there is a definite milestone event that causes the data to change. The data can change multiple times, maybe an unlimited number of times.
Another typical change is speculative reorganisation. “What If: I reorganised my sales team, how would that affect my numbers?”. You may not wish to “commit” these changes but you have to save them to be able to report on them.
SalesPersonDimension
Sales Person Key |
Sales Person ID |
Sales Person Name |
Mobile Tel No |
Sales Region |
New Region |
Valid From |
Valid To |
Current Flag |
---|---|---|---|---|---|---|---|---|
1 | 000001 | Mary Jones | 07771 123 4567 | South | 1-Jan-2005 | 31-Dec-9999 | Y | |
2 | 000002 | John Smith | 07772 321 7654 | North | 1-Mar-2006 | 31-Dec-9999 | Y |
SalesFactTable
SalesPersonKey | SalesDate | SalesAmount |
---|---|---|
1 | 1-Feb-2008 | £10,000.00 |
2 | 1-Feb-2008 | £9,500.00 |
Mary and John have been working very closely and they have decided to get married, accordingly Mary’s name changes. At the same time John gets a new mobile phone and his number changes. The sales director also wants to try reallocating his staff to different regions to see what impact this will have on the sales figures.
SalesPersonDimension
Sales Person Key |
Sales Person ID |
Sales Person Name |
Mobile Tel No |
Sales Region |
New Region |
Valid From |
Valid To |
Current Flag |
---|---|---|---|---|---|---|---|---|
1 | 000001 | Mary Jones | 07771 123 4567 | South | 1-Mar-2006 | 1-Mar-2008 | N | |
2 | 000002 | John Smith | 07775 987 2323 | North | South | 1-Mar-2006 | 31-Dec-9999 | Y |
3 | 000001 | Mary Smith | 07771 123 4567 | South | North | 1-Mar-2008 | 31-Dec-9999 | Y |
SalesFactTable
SalesPersonKey | SalesDate | SalesAmount |
---|---|---|
1 | 1-Feb-2008 | £10,000.00 |
2 | 1-Feb-2008 | £9,500.00 |
3 | 1-Mar-2008 | £11,000.00 |
2 | 1-Mar-2008 | £12,500.00 |
There are a few things to note here:
Two new columns have been added, SalesRegion and NewSalesRegion. SalesRegion will hold the “current” value and NewRegion will hold the proposed new value. Both of these columns are type 1 as we are not tracking changes to them. We could of course have created OldRegion, CurrentRegion and NewRegion. I’m sure you can see the problem, the more history we want to track the more columns we have to add and the more maintenance that has to be done to them: Old=North, Current=South becomes: Old=North, Current=South, New=West becomes Old=South, Current=West, New=East and so on.
The onus is on the report writer to choose which value they want to report on.
Of the three; type 3 is the least common but there are certain requirements where it is definitely the “correct” choice.
3 replies on “Modelling Change In Your Dimensions”
There is another method to handle the changes to Dimension values. The lineage of the dimension values would be tracked using the attributes âValid Fromâ? and âValid Toâ?, which would suffice to indicate the validity of the value at a point of time.
The reports would use these two date attributes to select the appropriate value, albeit without a flag. A â31-Dec-999â? in the âValid Toâ? column would indicate the record to be âCurrentâ.
Concisely, this would be Versioning without the flag.
You are quite correct in what you say.
What you describe is Type-2 Slowly Changing Dimensions.
You are also correct that you do not need the CurrentFlag attribute that it can be completely replaced with the predicate ValidTo=to_date(’31-Dec-9999′, ‘dd-Mon-YYYY’).
The reason that I tend to add this simple flag, one byte per record, is for simple ease of use. If all users will be using Business Objects then I would replace the physical attribute with a logical one that used the logic you described. If however users will be using other query tools such as SAS or even SQL then they have to “know” which date to use.
I would never question the competency of the people that will be querying these systems but the easier that we can make it the better!
Ross
I have just noticed that Ross Goodman at Ecelectic is blogging, so I’ll add him to the blogroll.
http://rossgoodman.com/2008/02/28/modelling-change-in-your-dimensions/
adrian.ward@addidici.com
http://www.addidici.com