Cube Desing Question - partioned by date db model
-
- Posts: 7
- Joined: Fri Apr 09, 2010 8:21 pm
- OLAP Product: TM1
- Version: 9.5
- Excel Version: 2003
Cube Desing Question - partioned by date db model
Hello Experts,
I have a challenging task, that is to build a historical cube off relational data model, which is partitioned by a date, so each day we have a new set of data in all reference and fact tables. In reality, no one would ever aggregate the data across multiple dates, but the user might need to report off each partition individually or compare the difference between 2 given dates. Normally users are only interested in 30 days of data. Each day is about 200k of rows.
Question: What are the modeling options that I have to create that historical cube?
I thought of some possible approaches:
#1. Create a master cube containing all data members. Each Dimension Element will have Partition Key (date_key) pre-appended to it's real element key in the Fact table and in reference tables, that way, when the user selects a reporting date, only data available for that partition will be displayed.
Challenges - Dimensions:
Since the aliases for each dimensions have to be unique, I would need to append Partition Key(date_key) to all dimension's elements and aliases, so a given dimension hierarchy will look like that:
Jan-01, 2010
->Company_Group 1 - Jan-01, 2010
->Company 1 - Jan 01, 2010
Jan-02, 2010
->Company_Group 1 - Jan-02, 2010
->Company 1 - Jan 02, 2010
Which is somewhat cumbersome, if dimensions are quite large.
#2. Create 30 individual cubes and link them to a single master cube (theoretical, have not thought of practical implementation yet).
Please assist me with that challenge, I am sure there are other options available on how to implement that...
Thank you,
Ann
I have a challenging task, that is to build a historical cube off relational data model, which is partitioned by a date, so each day we have a new set of data in all reference and fact tables. In reality, no one would ever aggregate the data across multiple dates, but the user might need to report off each partition individually or compare the difference between 2 given dates. Normally users are only interested in 30 days of data. Each day is about 200k of rows.
Question: What are the modeling options that I have to create that historical cube?
I thought of some possible approaches:
#1. Create a master cube containing all data members. Each Dimension Element will have Partition Key (date_key) pre-appended to it's real element key in the Fact table and in reference tables, that way, when the user selects a reporting date, only data available for that partition will be displayed.
Challenges - Dimensions:
Since the aliases for each dimensions have to be unique, I would need to append Partition Key(date_key) to all dimension's elements and aliases, so a given dimension hierarchy will look like that:
Jan-01, 2010
->Company_Group 1 - Jan-01, 2010
->Company 1 - Jan 01, 2010
Jan-02, 2010
->Company_Group 1 - Jan-02, 2010
->Company 1 - Jan 02, 2010
Which is somewhat cumbersome, if dimensions are quite large.
#2. Create 30 individual cubes and link them to a single master cube (theoretical, have not thought of practical implementation yet).
Please assist me with that challenge, I am sure there are other options available on how to implement that...
Thank you,
Ann
- rkaif
- Community Contributor
- Posts: 328
- Joined: Fri Sep 05, 2008 6:58 pm
- OLAP Product: IBM Cognos TM1
- Version: 9.1 or later
- Excel Version: 2003 or later
Re: Cube Desing Question - partioned by date db model
First you need a cube with dimensions like:
1. Company (will have elements like Company A, Company B.....)
2. Group (will have elements like Group A, Group B.....)
3. ....
4. .....
5. Period
6. Measures (example could be Price, Qty, Amount etc.)
Your Period/Time dimension will have hierarchies like Year -> Month -> Day. There is NO need to create your other dimensions with the date_key. For details see the following post: http://forums.olapforums.com/viewtopic.php?t=1936#p9925
Let me know if you need further explanation.
1. Company (will have elements like Company A, Company B.....)
2. Group (will have elements like Group A, Group B.....)
3. ....
4. .....
5. Period
6. Measures (example could be Price, Qty, Amount etc.)
Your Period/Time dimension will have hierarchies like Year -> Month -> Day. There is NO need to create your other dimensions with the date_key. For details see the following post: http://forums.olapforums.com/viewtopic.php?t=1936#p9925
Let me know if you need further explanation.
Cheers!
Rizwan Kaif
Rizwan Kaif
-
- MVP
- Posts: 2836
- Joined: Tue Feb 16, 2010 2:39 pm
- OLAP Product: TM1, Palo
- Version: Beginning of time thru 10.2
- Excel Version: 2003-2007-2010-2013
- Location: Atlanta, GA
- Contact:
Re: Cube Desing Question - partioned by date db model
Why make it so complicated? Don't muck up the other dimensions that are providing worthwhile slices of data. Just add a dimension to the cube called "Partition Day" with elements like D1, D2, D3, D4, etc. and load each days worth of data in a different element of this dimension. When I say each day's worth of data, I don't mean which date the numbers represent, I mean which partition the data came from. If you just want to keep a rolling 30 days worth of partitions have your TI load process query the dimension for the first and last element, add a new element with the next sequential number and lop off the oldest. For example, if you are now on Day 31, the TI process would be adding D31 and removing D1 so that you would now have D2 thru D31, and so forth. Since this isn't actually a date, it just represents a different block of data, you don't need to worry about the elements being called anything special and you don't really need a hierarchy.
-
- Posts: 7
- Joined: Fri Apr 09, 2010 8:21 pm
- OLAP Product: TM1
- Version: 9.5
- Excel Version: 2003
Re: Cube Desing Question - partioned by date db model
rkaif, tomok,
Thank you for your suggestions. It actually helped me to understand what I am really after much better. It's a bit confusing, but here is our db setup.
Each day a new partition is created for the full stack of data available on that day. Partitions in our case represent a version of the data on that day(it's a lazy way of keeping a history, instead of determining what's changed, new snapshot of data is created, regardless if it's changed or not, and appended to the same tables with a new partition_key). This is viewed as a reporting context. However we do have a separate date dimension - Reporting Date, which is indicative of the date we've received the exposure data. For each partition or context we will have a full stack of days - all reported exposures to date....
I've mocked up a simple star schema to create a visual... Each Dimension table have 2 keys that I would probably need to link the data with, actual Primary Key, like CompanyID and a context/version date - Partition_Key, hence creating a confusion...
Please help.
Regards,
Ann
Thank you for your suggestions. It actually helped me to understand what I am really after much better. It's a bit confusing, but here is our db setup.
Each day a new partition is created for the full stack of data available on that day. Partitions in our case represent a version of the data on that day(it's a lazy way of keeping a history, instead of determining what's changed, new snapshot of data is created, regardless if it's changed or not, and appended to the same tables with a new partition_key). This is viewed as a reporting context. However we do have a separate date dimension - Reporting Date, which is indicative of the date we've received the exposure data. For each partition or context we will have a full stack of days - all reported exposures to date....
I've mocked up a simple star schema to create a visual... Each Dimension table have 2 keys that I would probably need to link the data with, actual Primary Key, like CompanyID and a context/version date - Partition_Key, hence creating a confusion...
Please help.
Regards,
Ann
-
- Posts: 7
- Joined: Fri Apr 09, 2010 8:21 pm
- OLAP Product: TM1
- Version: 9.5
- Excel Version: 2003
Re: Cube Desing Question - partioned by date db model
Here is our schema sample. Thanks, Ann
-
- MVP
- Posts: 2836
- Joined: Tue Feb 16, 2010 2:39 pm
- OLAP Product: TM1, Palo
- Version: Beginning of time thru 10.2
- Excel Version: 2003-2007-2010-2013
- Location: Atlanta, GA
- Contact:
Re: Cube Desing Question - partioned by date db model
I don't understand what you're asking. If you asking for the structure of the cube that is easy, it should look just like your Exposure Fact Table, with a dimension for each of these fields. If it's something else you need to be more specific. I will tell you that you need to stop thinking about things in a relational database mind-set. OLAP cubes are different and require a new way of thinking. Good luck.
- rkaif
- Community Contributor
- Posts: 328
- Joined: Fri Sep 05, 2008 6:58 pm
- OLAP Product: IBM Cognos TM1
- Version: 9.1 or later
- Excel Version: 2003 or later
Re: Cube Desing Question - partioned by date db model
As we have explained you earlier that you do not need to have the Partition_Key in the Company, Party and the Product Dimension.ABRANDE wrote: I've mocked up a simple star schema to create a visual... Each Dimension table have 2 keys that I would probably need to link the data with, actual Primary Key, like CompanyID and a context/version date - Partition_Key, hence creating a confusion...
Hope it helps!
Cheers!
Rizwan Kaif
Rizwan Kaif
-
- Posts: 7
- Joined: Fri Apr 09, 2010 8:21 pm
- OLAP Product: TM1
- Version: 9.5
- Excel Version: 2003
Re: Cube Desing Question - partioned by date db model
Thank you, so much for your replies! I am really after figuring out how to best model the cube, dimensions, etc for this particular case...
I am trying to twist my brain a bit, to think in the OLAP terms...
Here is my question: If I drop the Partition Key from the Reference data tables, like Party, Company, Product, then I will only be able to load one version of the truth(reference data) for a given partition, since the Description for the same ProductID might change depending which partition users are looking at, but the ProductID will be the same... So If I load this sample dataset into the Product Table, without specifying partition then it will look like this:
ProductID: 1
ProductDescription: Bonds (description for partition date: Jan 01, 2010)
ProductID: 1
ProductDescription: Municipal Bonds (description for partition date: Jan 02, 2010)
Wouldn't it be a problem then? How would I then specify in the fact table which description needs to be picked up for a given partition, since the ProductIDs will be the same?
Thank you,
Ann
I am trying to twist my brain a bit, to think in the OLAP terms...
Here is my question: If I drop the Partition Key from the Reference data tables, like Party, Company, Product, then I will only be able to load one version of the truth(reference data) for a given partition, since the Description for the same ProductID might change depending which partition users are looking at, but the ProductID will be the same... So If I load this sample dataset into the Product Table, without specifying partition then it will look like this:
ProductID: 1
ProductDescription: Bonds (description for partition date: Jan 01, 2010)
ProductID: 1
ProductDescription: Municipal Bonds (description for partition date: Jan 02, 2010)
Wouldn't it be a problem then? How would I then specify in the fact table which description needs to be picked up for a given partition, since the ProductIDs will be the same?
Thank you,
Ann
-
- MVP
- Posts: 2836
- Joined: Tue Feb 16, 2010 2:39 pm
- OLAP Product: TM1, Palo
- Version: Beginning of time thru 10.2
- Excel Version: 2003-2007-2010-2013
- Location: Atlanta, GA
- Contact:
Re: Cube Desing Question - partioned by date db model
There you go again thinking relationally. In a relational DB you normalize your tables. In OLAP there is no such thing as normalization. If you have the possibility of having a different description for a product then "Product Description" becomes an element of the Measures dimension and you HAVE to populate it for each intersection in the cube. Most of the time metadata, such is account or product description would not change so that could become an attribute of the account number or product ID. In your case the description may vary by partition so you have to make a spot in the cube for it and then populate it. Yes, it is wasted space because the description MAY be the same for 99% of the products along the different partitions but if you want the flexibility of letting them be different you have no choice.ABRANDE wrote:Wouldn't it be a problem then? How would I then specify in the fact table which description needs to be picked up for a given partition, since the ProductIDs will be the same?
-
- Posts: 16
- Joined: Tue Nov 23, 2010 4:12 pm
- OLAP Product: Cognos TM1
- Version: 10.1.0
- Excel Version: 2007
- Location: Portland, OR USA
Re: Cube Desing Question - partioned by date db model
The technical name of your problem is slowly changing dimensions. When a user looks at the data they want to see the lookup values as they were at the time of the transaction.
Here are some random thoughts:
-- For changing hierarchies, have a seperate hierarchy for each partition. The lowest level elemnts do not need the partion key, but all the levels above them would need it as part of the name.
-- Have the lowest level elemnts contain the partition key as well, but have an alternate rollup that is set to the most current version of the hierarachy with the additional level which consolidates all the seperate partitioned keys into one element with the base primary id for the dimension. This will allow you to compare ids across different partitions.
-- For changing attributes, have a seperate attribute for each partition. For display purposes in the cube you could have a rule that looks up the correct attribute based on the selected partition.
-- Instead of using traditional attributes create a small cube for each dimension that also has the dimension, partition key, and attribute name measure dimension.
Hope some of these ideas is a help.
Here are some random thoughts:
-- For changing hierarchies, have a seperate hierarchy for each partition. The lowest level elemnts do not need the partion key, but all the levels above them would need it as part of the name.
-- Have the lowest level elemnts contain the partition key as well, but have an alternate rollup that is set to the most current version of the hierarachy with the additional level which consolidates all the seperate partitioned keys into one element with the base primary id for the dimension. This will allow you to compare ids across different partitions.
-- For changing attributes, have a seperate attribute for each partition. For display purposes in the cube you could have a rule that looks up the correct attribute based on the selected partition.
-- Instead of using traditional attributes create a small cube for each dimension that also has the dimension, partition key, and attribute name measure dimension.
Hope some of these ideas is a help.
- paulsimon
- MVP
- Posts: 808
- Joined: Sat Sep 03, 2011 11:10 pm
- OLAP Product: TM1
- Version: PA 2.0.5
- Excel Version: 2016
- Contact:
Re: Cube Desing Question - partioned by date db model
Hi
I would have two time dimensions in the cube, one for the Partition Date and one for the Reporting Date, plus the other dimensions. That will then give you all the flexibility that you require.
When you want to load data, add an element to the Partition Date dimension, and load.
When a Partition is too old to be kept just delete that element from the dimension and it will drop the data against that Partition.
Personally I would keep a Partition dimension with 30 static slots and eg PD_01 thru PD_30 and just change the alias on them to represent the Partition date, as it is best to avoid updating dimensions. However, that is probably only an issue if you are loading during the day.
Regards
Paul Simon
I would have two time dimensions in the cube, one for the Partition Date and one for the Reporting Date, plus the other dimensions. That will then give you all the flexibility that you require.
When you want to load data, add an element to the Partition Date dimension, and load.
When a Partition is too old to be kept just delete that element from the dimension and it will drop the data against that Partition.
Personally I would keep a Partition dimension with 30 static slots and eg PD_01 thru PD_30 and just change the alias on them to represent the Partition date, as it is best to avoid updating dimensions. However, that is probably only an issue if you are loading during the day.
Regards
Paul Simon
-
- Posts: 7
- Joined: Fri Apr 09, 2010 8:21 pm
- OLAP Product: TM1
- Version: 9.5
- Excel Version: 2003
Re: Cube Desing Question - partioned by date db model
Thank you for your suggestions. We ended up building a cube with SCD for all dimensions, except the one with a hierarchy (Party dim). I will need to remodel it later on...
The difficulty was -- we are reporting off ~50k parties each day (and out of ~50k only 200-300 names are changed daily), so setting element to be a combo of partyID+Partition_Key will not solve our Alias Problem, as we would run into duplicate Alias problem then, unless we append partition_key to each alias description (that I am sure users would be very unhappy with
)...
I do like the idea of the separate partition dimension, to have a better grip on data removal in the cube! I also tried to build a cube with party names (string values up to 100 chars...) as elements, and it seemed to function fine, as soon as I would add another dimension - Party Parent, using party name as an element, cube views would fail to get generated, with Out of Memory exception
...
Another reason, for not implementing SCD at that time, was that we had to apply element level security on this dimension... However, later on, we figured out a way to apply element level security on a different - smaller dimension, as custom security scripts were really time consuming due to the large number of elements in the original Party Dim. Setting up element level security was quite a challenge on it’s own
I've started writing a custom TI script of looping through the hierarchies, and appending start + end dates to the party names aliases that are changed or repointed, to indicate and preserve historical name/Aliases, as well as historical parent's name/Aliases (consolidations)...
The difficulty was -- we are reporting off ~50k parties each day (and out of ~50k only 200-300 names are changed daily), so setting element to be a combo of partyID+Partition_Key will not solve our Alias Problem, as we would run into duplicate Alias problem then, unless we append partition_key to each alias description (that I am sure users would be very unhappy with

I do like the idea of the separate partition dimension, to have a better grip on data removal in the cube! I also tried to build a cube with party names (string values up to 100 chars...) as elements, and it seemed to function fine, as soon as I would add another dimension - Party Parent, using party name as an element, cube views would fail to get generated, with Out of Memory exception

Another reason, for not implementing SCD at that time, was that we had to apply element level security on this dimension... However, later on, we figured out a way to apply element level security on a different - smaller dimension, as custom security scripts were really time consuming due to the large number of elements in the original Party Dim. Setting up element level security was quite a challenge on it’s own

I've started writing a custom TI script of looping through the hierarchies, and appending start + end dates to the party names aliases that are changed or repointed, to indicate and preserve historical name/Aliases, as well as historical parent's name/Aliases (consolidations)...
Re: Cube Desing Question - partioned by date db model
Yeah, I feel your pain. TM1 does not handle SCDs natively.
What we have done in some cases, and I know the traditional TM1ers would cringe at this, is create separate cubes and dimensions on the fly for each partition. The problem you are running into is that for you it is daily rather than where we have put this into play was monthly or even yearly.
As you stated, you are basically going to have to loop through the elements of the dimensions and see which ones have been updated and then rename them in a LIFO stack method, where the current member is always the active member. This is not an easy task because you are going to have to watch out with intermediate hierarchies, but it will give you the most compact SCD dimension.
By intermediate hierarchies, I mean this:
If you had two elements D and E with the following hierarchy:
A -> B -> C -> D and A -> B -> C -> E
The next load, you now had
A -> B -> C -> D and A -> X -> C-> E
You would have a very difficult time to manage this in TM1.
It would be better if you could do that in the database before TM1 gets ahold of it. One possible method might be:
I am sure there may be other/better ways to do this, but perhaps this may spark a few ideas.
What we have done in some cases, and I know the traditional TM1ers would cringe at this, is create separate cubes and dimensions on the fly for each partition. The problem you are running into is that for you it is daily rather than where we have put this into play was monthly or even yearly.
As you stated, you are basically going to have to loop through the elements of the dimensions and see which ones have been updated and then rename them in a LIFO stack method, where the current member is always the active member. This is not an easy task because you are going to have to watch out with intermediate hierarchies, but it will give you the most compact SCD dimension.
By intermediate hierarchies, I mean this:
If you had two elements D and E with the following hierarchy:
A -> B -> C -> D and A -> B -> C -> E
The next load, you now had
A -> B -> C -> D and A -> X -> C-> E
You would have a very difficult time to manage this in TM1.
It would be better if you could do that in the database before TM1 gets ahold of it. One possible method might be:
- Create a field called IsTM1Key as a boolean value.
- Create another field which is the alias (display value) that you want in TM1, say TM1DisplayName
- Clear IsTM1Key, and TM1DisplayName field from all records before updating from the master table.
- Rip through the master table and either flag the IsTM1Key with true where the values are the same, or add a new sequentially numbered key that is added only when the element changes.
- Rip through the TM1DisplayName column and create your unique key (i.e., as you had indicated Name + Date).
- Then re-load the entire dimension into TM1, using your sequential key as the element id and then the TM1DisplayName as the alias.
I am sure there may be other/better ways to do this, but perhaps this may spark a few ideas.
Re: Cube Desing Question - partioned by date db model
Oops, I missed somthing. The line:
should have been:blackhawk wrote: [*]Rip through the TM1DisplayName column and create your unique key (i.e., as you had indicated Name + Date).
- Rip through the TM1DisplayName column and create your unique key (i.e., as you had indicated Name + Date) where IsTM1Key is false.
-
- Community Contributor
- Posts: 211
- Joined: Tue Sep 15, 2009 11:13 pm
- OLAP Product: IBMPA
- Version: PA 2.0 Cloud
- Excel Version: 2010
Re: Cube Desing Question - partioned by date db model
I have a feeling our next project of the blocks is heading towards a SCD scenario so this is good info. Just a question on the last post blackhawk
if you
Also the hierarchy would be the latest, any previous level 1+ structures blown away? Or are there multiple hierarchies for each 'partition'?
if you
what would be the intention for comparison reporting across the new unique Tm1DisplayName and previous Tm1DisplayName/s ?Rip through the TM1DisplayName column and create your unique key (i.e., as you had indicated Name + Date) where IsTM1Key is false.
Also the hierarchy would be the latest, any previous level 1+ structures blown away? Or are there multiple hierarchies for each 'partition'?
GG
Re: Cube Desing Question - partioned by date db model
How about this idea:
Create 2 cubes, one fact cube with partition dimension, the other lookup cube with partition, productID with string data product description.
While fact cube is reporting, the productID and PartisionID jointly populate product description in one spreadsheet report. Similar to relational database joints concept.
Hope it is simple solution.
Jim
Create 2 cubes, one fact cube with partition dimension, the other lookup cube with partition, productID with string data product description.
While fact cube is reporting, the productID and PartisionID jointly populate product description in one spreadsheet report. Similar to relational database joints concept.
Hope it is simple solution.
Jim
-
- Posts: 7
- Joined: Fri Apr 09, 2010 8:21 pm
- OLAP Product: TM1
- Version: 9.5
- Excel Version: 2003
Re: Cube Desing Question - partioned by date db model
blackhawk - Thank you! Sounds great and less painful
, I do like that concept, as it is still easier for me to implement via sql then via ti scripting... I will try that in a few days...
Fortunately for us, we only have single parent hierarchies (multi-parents hierarchies are probably a bit more painful to implement, because you might need to deal with weights as well...) For now we are using fixed 2 level hierarchies. Though, I do unticipate users breaking from the fixed 2 level hierarchy concept and using the n-level/legal hierarchies in stead in the near future...
jimshen - Thank you for your suggestion... I will try that as well

Fortunately for us, we only have single parent hierarchies (multi-parents hierarchies are probably a bit more painful to implement, because you might need to deal with weights as well...) For now we are using fixed 2 level hierarchies. Though, I do unticipate users breaking from the fixed 2 level hierarchy concept and using the n-level/legal hierarchies in stead in the near future...
jimshen - Thank you for your suggestion... I will try that as well
-
- Community Contributor
- Posts: 211
- Joined: Tue Sep 15, 2009 11:13 pm
- OLAP Product: IBMPA
- Version: PA 2.0 Cloud
- Excel Version: 2010
Re: Cube Desing Question - partioned by date db model
We aim to create a staging cube for the lowest level data with the partition dimension (weekly timestamp -1 -52 weeks for financial year).
Then create reporting cubes off this base information. (similar to jimshen suggestion)
Two compensate for SCD we have a compromise at rollup Hiererachy level to take a snapshot quartely for new rollup, while retaining the previous hierarchy as a alternate (max 4 rollups). For the leaf node items name changes we will deal with in aliases but 'new' items will be tagged to current hierarchy in correct rollup and alternate (previous quarters) place in dummy dumping rollup.
Future development (next phase) will incorparate a relational database to manage the base data and changes (allowing history past one year to be retained), TM1 will only be the multidimensional reporting source for standard analytical requirements (we use CX Report Studio). Any tricky trend reporting will be managed over the relational view (using CX Report Studio).
Then create reporting cubes off this base information. (similar to jimshen suggestion)
Two compensate for SCD we have a compromise at rollup Hiererachy level to take a snapshot quartely for new rollup, while retaining the previous hierarchy as a alternate (max 4 rollups). For the leaf node items name changes we will deal with in aliases but 'new' items will be tagged to current hierarchy in correct rollup and alternate (previous quarters) place in dummy dumping rollup.
Future development (next phase) will incorparate a relational database to manage the base data and changes (allowing history past one year to be retained), TM1 will only be the multidimensional reporting source for standard analytical requirements (we use CX Report Studio). Any tricky trend reporting will be managed over the relational view (using CX Report Studio).
GG