I was recently asked how to include a recursive hierarchy into a dimensional model.
What do I mean by a recursive hierarchy? This is when an entity (a table) relates to it’s self. Take the following example from an Oracle database:
EMPNO | ENAME | JOB | MGR | HIREDATE | SAL | COMM | DEPTNO |
---|---|---|---|---|---|---|---|
7369 | ‘SMITH’ | ‘CLERK’ | 7902 | ’17-DEC-1980′ | 800 | NULL | 20 |
7499 | ‘ALLEN’ | ‘SALESMAN’ | 7698 | ’20-FEB-1981′ | 1600 | 300 | 30 |
7521 | ‘WARD’ | ‘SALESMAN’ | 7698 | ’22-FEB-1981′ | 1250 | 500 | 30 |
7566 | ‘JONES’ | ‘MANAGER’ | 7839 | ‘2-APR-1981’ | 2975 | NULL | 20 |
7654 | ‘MARTIN’ | ‘SALESMAN’ | 7698 | ’28-SEP-1981′ | 1250 | 1400 | 30 |
7698 | ‘BLAKE’ | ‘MANAGER’ | 7839 | ‘1-MAY-1981’ | 2850 | NULL | 30 |
7782 | ‘CLARK’ | ‘MANAGER’ | 7839 | ‘9-JUN-1981’ | 2450 | NULL | 10 |
7788 | ‘SCOTT’ | ‘ANALYST’ | 7566 | ’09-DEC-1982′ | 3000 | NULL | 20 |
7839 | ‘KING’ | ‘PRESIDENT’ | NULL | ’17-NOV-1981′ | 5000 | NULL | 10 |
7844 | ‘TURNER’ | ‘SALESMAN’ | 7698 | ‘8-SEP-1981’ | 1500 | 0 | 30 |
7876 | ‘ADAMS’ | ‘CLERK’ | 7788 | ’12-JAN-1983′ | 1100 | NULL | 20 |
7900 | ‘JAMES’ | ‘CLERK’ | 7698 | ‘3-DEC-1981’ | 950 | NULL | 30 |
7902 | ‘FORD’ | ‘ANALYST’ | 7566 | ‘3-DEC-1981’ | 3000 | NULL | 20 |
7934 | ‘MILLER’ | ‘CLERK’ | 7782 | ’23-JAN-1982′ | 1300 | NULL | 10 |
From this we can see that there is a recursive hierarchy within this table between the empno column and the mgr column, the mgr column shows the employee that is the manager for each row.
For example:
7934-Miller is managed by 7782-Clark who in turn is managed by 7839-King who is the president.
ie:
7839-King | 7566-Jones | 7788-Scott | 7876-Adams |
. | . | 7902-Ford | 7369-Smith |
. | 7698-Blake | 7499-Allen 7521-Ward 7654-Martin 7844-Turner 7900-James |
. |
. | 7782-Clark | 7934-Miller | . |
From this source data I’m going to create an employee dimension:
EMPNO | ENAME | JOB | MGR | HIREDATE |
---|---|---|---|---|
7369 | ‘SMITH’ | ‘CLERK’ | 7902 | ’17-DEC-1980′ |
7499 | ‘ALLEN’ | ‘SALESMAN’ | 7698 | ’20-FEB-1981′ |
7521 | ‘WARD’ | ‘SALESMAN’ | 7698 | ’22-FEB-1981′ |
7566 | ‘JONES’ | ‘MANAGER’ | 7839 | ‘2-APR-1981’ |
7654 | ‘MARTIN’ | ‘SALESMAN’ | 7698 | ’28-SEP-1981′ |
7698 | ‘BLAKE’ | ‘MANAGER’ | 7839 | ‘1-MAY-1981’ |
7782 | ‘CLARK’ | ‘MANAGER’ | 7839 | ‘9-JUN-1981’ |
7788 | ‘SCOTT’ | ‘ANALYST’ | 7566 | ’09-DEC-1982′ |
7839 | ‘KING’ | ‘PRESIDENT’ | NULL | ’17-NOV-1981′ |
7844 | ‘TURNER’ | ‘SALESMAN’ | 7698 | ‘8-SEP-1981’ |
7876 | ‘ADAMS’ | ‘CLERK’ | 7788 | ’12-JAN-1983′ |
7900 | ‘JAMES’ | ‘CLERK’ | 7698 | ‘3-DEC-1981’ |
7902 | ‘FORD’ | ‘ANALYST’ | 7566 | ‘3-DEC-1981’ |
7934 | ‘MILLER’ | ‘CLERK’ | 7782 | ’23-JAN-1982′ |
And an Employee Fact:
EMPNO | SAL | COMM |
---|---|---|
7369 | 800 | NULL |
7499 | 1600 | 300 |
7521 | 1250 | 500 |
7566 | 2975 | NULL |
7654 | 1250 | 1400 |
7698 | 2850 | NULL |
7782 | 2450 | NULL |
7788 | 3000 | NULL |
7839 | 5000 | NULL |
7844 | 1500 | 0 |
7876 | 1100 | NULL |
7900 | 950 | NULL |
7902 | 3000 | NULL |
7934 | 1300 | NULL |
From this we can see each employees salary and commission by joining on the employee number.
What we cannot see is what 7782-Clarks salary bill is, ie, his own salary and all of his subordinates. This is what we need to model.
This is resolved by builting an intermediate “bridging” table. This bridging table is used to hold every valid permutation of the relationships. You must remember to include the reflective relationship: each rows relationship with it’s self, to get each employees “own” salary and the transitive relationship, each rows relationship with its sibling records. This would give us:
Parent | Child | Levels |
---|---|---|
7839 | 7839 | 0 |
7839 | 7566 | 1 |
7839 | 7788 | 2 |
7839 | 7876 | 3 |
7839 | 7902 | 2 |
7839 | 7369 | 3 |
7839 | 7698 | 1 |
7839 | 7499 | 2 |
7839 | 7521 | 2 |
7839 | 7654 | 2 |
7839 | 7844 | 2 |
7839 | 7900 | 2 |
7839 | 7782 | 1 |
7839 | 7934 | 2 |
7566 | 7566 | 0 |
7566 | 7788 | 1 |
7566 | 7876 | 2 |
7566 | 7902 | 1 |
7566 | 7369 | 2 |
7788 | 7788 | 0 |
7788 | 7876 | 1 |
7876 | 7876 | 0 |
7902 | 7902 | 0 |
7902 | 7369 | 1 |
7698 | 7698 | 0 |
7698 | 7499 | 1 |
7698 | 7521 | 1 |
7698 | 7654 | 1 |
7698 | 7844 | 1 |
7698 | 7900 | 1 |
7499 | 7499 | 0 |
7521 | 7521 | 0 |
7654 | 7654 | 0 |
7844 | 7844 | 0 |
7900 | 7900 | 0 |
7782 | 7782 | 0 |
7782 | 7934 | 1 |
7934 | 7934 | 0 |
This maps out every permutation of parent to child and the number of levels between them. The parent key will link to the dimension table and the child key will link to the fact table.
You can also add a “BottomLevel” flag to indicate the bottom level nodes if this is a reporting requirement.
The level field can be used, for example, 0 will give a direct mapping, ie show employees their own records. You could use level = 1 to see one level down, ie staff who report directly to the parent. You could ignore the level field and see all subordinate records.
All of the samples have been directly typed rather than being copied from the Oracle emp table so there is the chance of some transcription errors. If you find any please let me know and I’ll correct them.
4 replies on “Dimensionally Modelling A Recursive Hierarchy”
Jings, this is a bit of a deep topic for a blog Ross !! It’s amazing what a bit of googling for Dimensional Modelling can turn up !!
Take care, AB =:-)
I know, a colleague asked me how this could be done and this was the easiest way to get it to him.
If it drives more traffic to my site, all the better !
There is a correction required in the emp number for clark in the below like from 7762 to 7782
“For example:
7934-Miller is managed by 7762[7782]-Clark who in turn is managed by 7839-King who is the president.
ie:”
Thanks very much for that.
I have corrected the original text.
Ross