photo credit: Felipe Morin
I am often asked by my colleagues to explain dimensional modelling concepts to them. I will try to capture come on the concepts here. I am going to focus on Facts.
Specifically, types of facts: Accumulating, Factless, Transactions, Snapshots, Additive, SemiAdditive and Non Additive
Accumulating Fact Table
An accumulating fact table is where all of the dimensional attributes are not available at the time of creation and the dimensions that are linked to a fact table change over time. The most common implementation of this is in the recording of dates against facts.
Take a “Sales” fact, typical dates you may be intersted in when tracking an individual sale is maybe, order_date, ship_date, delivery_date and payment_date. These would not all be available when the fact is first created. Over time the fact record would accumulate more relationships with the dimensions as the relevant date milestones were passed for the sale.
A factless fact is where the fact does not store an actual numerical measure, the mere existance of a fact record indicates that an event has happened that you wish to track.
The classic example of this would be an “Attendance” fact. If you had dimensions to record date, scheduled_course, instructor and delegate then you could create a fact table that held the permutations of these dimensions. From this you could evaluate the number of courses you run, the number of delegates, the number of courses by instructor etc.
I would never simply leave a factess fact as a bare collection of foreign key columns I would always add a dummy measure column which would be set to 1 which you would then sum.
This is the most common type of fact. You would declare the grain of the fact, ie the level of detail and then this is what would be stored. For example you may have a sales_order fact, every time a new sales order a new row would be created in the fact table. alternatively you may have a “monthly_sales” fact. At the end of every month you would aggregate up all the sales that happened in that month and record the single total value.
The snapshot fact contains a reflection of the state of an entity at a given point in time. A classic example of this would be a “daily_balance” fact in a banking system. This would, on a daily basis record the balance of each account, it would NOT list the individual transactions that happened on the account.
A fully additive fact is one where the measures can be aggregated.
For example our Sales fact above would be fully additive as you can aggregate the sales amout over time, by product, by region or by salesman and still get the correct answer.
Semi Additive Facts
A semi additive fact is one where the measure can either have only a subset of aggregations applied to it, it you can count the measures but not sum them, or the measures are only additive over a subset of the dimensions.
Using our “daily_balances” fact above would be a good example of a semi additive fact. The daily balances can be aggregated by customer if the customer has multiple accounts to give the customrs daily balance, however the balances could not be aggregated over time as adding last weeks balance onto this weeks balance would result in a nonsensical figure.
Non Additive Facts
A non additive fact is one where the measure is non aggregable over any dimensions. These are commonly where percentages have been calculated and stored in the fact. Another example could be a profit margin on a sale, there is this figure other than at an individual sale level.
One reply on “Dimensional Modelling – Facts”
Nice post! Here’s one for ya… In a hierarchical organization, the higher the level, the greater the confusion.