Cube Desing Question - partioned by date db model

ABRANDE · Post by **ABRANDE** » Wed Oct 13, 2010 9:23 pm

Hello Experts,

I have a challenging task, that is to build a historical cube off relational data model, which is partitioned by a date, so each day we have a new set of data in all reference and fact tables. In reality, no one would ever aggregate the data across multiple dates, but the user might need to report off each partition individually or compare the difference between 2 given dates. Normally users are only interested in 30 days of data. Each day is about 200k of rows.
Question: What are the modeling options that I have to create that historical cube?
I thought of some possible approaches:
#1. Create a master cube containing all data members. Each Dimension Element will have Partition Key (date_key) pre-appended to it's real element key in the Fact table and in reference tables, that way, when the user selects a reporting date, only data available for that partition will be displayed.
Challenges - Dimensions:
Since the aliases for each dimensions have to be unique, I would need to append Partition Key(date_key) to all dimension's elements and aliases, so a given dimension hierarchy will look like that:

Jan-01, 2010
->Company_Group 1 - Jan-01, 2010
->Company 1 - Jan 01, 2010
Jan-02, 2010
->Company_Group 1 - Jan-02, 2010
->Company 1 - Jan 02, 2010
Which is somewhat cumbersome, if dimensions are quite large.

#2. Create 30 individual cubes and link them to a single master cube (theoretical, have not thought of practical implementation yet).

Please assist me with that challenge, I am sure there are other options available on how to implement that...

Thank you,
Ann

rkaif · Post by **rkaif** » Wed Oct 13, 2010 9:51 pm

First you need a cube with dimensions like:

1. Company (will have elements like Company A, Company B.....)
2. Group (will have elements like Group A, Group B.....)
3. ....
4. .....
5. Period
6. Measures (example could be Price, Qty, Amount etc.)

Your Period/Time dimension will have hierarchies like Year -> Month -> Day. There is NO need to create your other dimensions with the date_key. For details see the following post: http://forums.olapforums.com/viewtopic.php?t=1936#p9925

Let me know if you need further explanation.

tomok · Post by **tomok** » Wed Oct 13, 2010 10:39 pm

Why make it so complicated? Don't muck up the other dimensions that are providing worthwhile slices of data. Just add a dimension to the cube called "Partition Day" with elements like D1, D2, D3, D4, etc. and load each days worth of data in a different element of this dimension. When I say each day's worth of data, I don't mean which date the numbers represent, I mean which partition the data came from. If you just want to keep a rolling 30 days worth of partitions have your TI load process query the dimension for the first and last element, add a new element with the next sequential number and lop off the oldest. For example, if you are now on Day 31, the TI process would be adding D31 and removing D1 so that you would now have D2 thru D31, and so forth. Since this isn't actually a date, it just represents a different block of data, you don't need to worry about the elements being called anything special and you don't really need a hierarchy.

ABRANDE · Post by **ABRANDE** » Thu Oct 14, 2010 3:20 pm

rkaif, tomok,

Thank you for your suggestions. It actually helped me to understand what I am really after much better. It's a bit confusing, but here is our db setup.
Each day a new partition is created for the full stack of data available on that day. Partitions in our case represent a version of the data on that day(it's a lazy way of keeping a history, instead of determining what's changed, new snapshot of data is created, regardless if it's changed or not, and appended to the same tables with a new partition_key). This is viewed as a reporting context. However we do have a separate date dimension - Reporting Date, which is indicative of the date we've received the exposure data. For each partition or context we will have a full stack of days - all reported exposures to date....
I've mocked up a simple star schema to create a visual... Each Dimension table have 2 keys that I would probably need to link the data with, actual Primary Key, like CompanyID and a context/version date - Partition_Key, hence creating a confusion...
Please help.

Regards,
Ann

ABRANDE · Post by **ABRANDE** » Thu Oct 14, 2010 3:24 pm

Here is our schema sample. Thanks, Ann

: SampleStartSchema.JPG (52.73 KiB) Viewed 14813 times

tomok · Post by **tomok** » Thu Oct 14, 2010 3:47 pm

I don't understand what you're asking. If you asking for the structure of the cube that is easy, it should look just like your Exposure Fact Table, with a dimension for each of these fields. If it's something else you need to be more specific. I will tell you that you need to stop thinking about things in a relational database mind-set. OLAP cubes are different and require a new way of thinking. Good luck.

rkaif · Post by **rkaif** » Thu Oct 14, 2010 4:57 pm

ABRANDE wrote: I've mocked up a simple star schema to create a visual... Each Dimension table have 2 keys that I would probably need to link the data with, actual Primary Key, like CompanyID and a context/version date - Partition_Key, hence creating a confusion...

As we have explained you earlier that you do not need to have the Partition_Key in the Company, Party and the Product Dimension.

Hope it helps!

ABRANDE · Post by **ABRANDE** » Thu Oct 14, 2010 8:40 pm

Thank you, so much for your replies! I am really after figuring out how to best model the cube, dimensions, etc for this particular case...

I am trying to twist my brain a bit, to think in the OLAP terms...

Here is my question: If I drop the Partition Key from the Reference data tables, like Party, Company, Product, then I will only be able to load one version of the truth(reference data) for a given partition, since the Description for the same ProductID might change depending which partition users are looking at, but the ProductID will be the same... So If I load this sample dataset into the Product Table, without specifying partition then it will look like this:
ProductID: 1
ProductDescription: Bonds (description for partition date: Jan 01, 2010)
ProductID: 1
ProductDescription: Municipal Bonds (description for partition date: Jan 02, 2010)

Wouldn't it be a problem then? How would I then specify in the fact table which description needs to be picked up for a given partition, since the ProductIDs will be the same?

Thank you,
Ann

tomok · Post by **tomok** » Thu Oct 14, 2010 9:02 pm

ABRANDE wrote:Wouldn't it be a problem then? How would I then specify in the fact table which description needs to be picked up for a given partition, since the ProductIDs will be the same?

There you go again thinking relationally. In a relational DB you normalize your tables. In OLAP there is no such thing as normalization. If you have the possibility of having a different description for a product then "Product Description" becomes an element of the Measures dimension and you HAVE to populate it for each intersection in the cube. Most of the time metadata, such is account or product description would not change so that could become an attribute of the account number or product ID. In your case the description may vary by partition so you have to make a spot in the cube for it and then populate it. Yes, it is wasted space because the description MAY be the same for 99% of the products along the different partitions but if you want the flexibility of letting them be different you have no choice.

Terramup · Post by **Terramup** » Wed Mar 30, 2011 3:53 pm

The technical name of your problem is slowly changing dimensions. When a user looks at the data they want to see the lookup values as they were at the time of the transaction.

Here are some random thoughts:
-- For changing hierarchies, have a seperate hierarchy for each partition. The lowest level elemnts do not need the partion key, but all the levels above them would need it as part of the name.
-- Have the lowest level elemnts contain the partition key as well, but have an alternate rollup that is set to the most current version of the hierarachy with the additional level which consolidates all the seperate partitioned keys into one element with the base primary id for the dimension. This will allow you to compare ids across different partitions.
-- For changing attributes, have a seperate attribute for each partition. For display purposes in the cube you could have a rule that looks up the correct attribute based on the selected partition.
-- Instead of using traditional attributes create a small cube for each dimension that also has the dimension, partition key, and attribute name measure dimension.

Hope some of these ideas is a help.

paulsimon · Post by **paulsimon** » Wed Mar 30, 2011 7:54 pm

Hi

I would have two time dimensions in the cube, one for the Partition Date and one for the Reporting Date, plus the other dimensions. That will then give you all the flexibility that you require.

When you want to load data, add an element to the Partition Date dimension, and load.

When a Partition is too old to be kept just delete that element from the dimension and it will drop the data against that Partition.

Personally I would keep a Partition dimension with 30 static slots and eg PD_01 thru PD_30 and just change the alias on them to represent the Partition date, as it is best to avoid updating dimensions. However, that is probably only an issue if you are loading during the day.

Regards

Paul Simon

ABRANDE · Post by **ABRANDE** » Tue May 31, 2011 9:29 pm

Thank you for your suggestions. We ended up building a cube with SCD for all dimensions, except the one with a hierarchy (Party dim). I will need to remodel it later on...
The difficulty was -- we are reporting off ~50k parties each day (and out of ~50k only 200-300 names are changed daily), so setting element to be a combo of partyID+Partition_Key will not solve our Alias Problem, as we would run into duplicate Alias problem then, unless we append partition_key to each alias description (that I am sure users would be very unhappy with

)...
I do like the idea of the separate partition dimension, to have a better grip on data removal in the cube! I also tried to build a cube with party names (string values up to 100 chars...) as elements, and it seemed to function fine, as soon as I would add another dimension - Party Parent, using party name as an element, cube views would fail to get generated, with Out of Memory exception

...
Another reason, for not implementing SCD at that time, was that we had to apply element level security on this dimension... However, later on, we figured out a way to apply element level security on a different - smaller dimension, as custom security scripts were really time consuming due to the large number of elements in the original Party Dim. Setting up element level security was quite a challenge on it’s own

I've started writing a custom TI script of looping through the hierarchies, and appending start + end dates to the party names aliases that are changed or repointed, to indicate and preserve historical name/Aliases, as well as historical parent's name/Aliases (consolidations)...

blackhawk · Post by **blackhawk** » Tue May 31, 2011 10:22 pm

Yeah, I feel your pain. TM1 does not handle SCDs natively.

What we have done in some cases, and I know the traditional TM1ers would cringe at this, is create separate cubes and dimensions on the fly for each partition. The problem you are running into is that for you it is daily rather than where we have put this into play was monthly or even yearly.

As you stated, you are basically going to have to loop through the elements of the dimensions and see which ones have been updated and then rename them in a LIFO stack method, where the current member is always the active member. This is not an easy task because you are going to have to watch out with intermediate hierarchies, but it will give you the most compact SCD dimension.

By intermediate hierarchies, I mean this:

If you had two elements D and E with the following hierarchy:
A -> B -> C -> D and A -> B -> C -> E

The next load, you now had
A -> B -> C -> D and A -> X -> C-> E

You would have a very difficult time to manage this in TM1.

It would be better if you could do that in the database before TM1 gets ahold of it. One possible method might be:

Create a field called IsTM1Key as a boolean value.
Create another field which is the alias (display value) that you want in TM1, say TM1DisplayName
Clear IsTM1Key, and TM1DisplayName field from all records before updating from the master table.
Rip through the master table and either flag the IsTM1Key with true where the values are the same, or add a new sequentially numbered key that is added only when the element changes.
Rip through the TM1DisplayName column and create your unique key (i.e., as you had indicated Name + Date).
Then re-load the entire dimension into TM1, using your sequential key as the element id and then the TM1DisplayName as the alias.

This (using a similar technique) gives you a way to manage the intermediate hierarchies, that would normally be very difficult to do post dimension build in TM1.

I am sure there may be other/better ways to do this, but perhaps this may spark a few ideas.

blackhawk · Post by **blackhawk** » Tue May 31, 2011 10:27 pm

Oops, I missed somthing. The line:

blackhawk wrote: [*]Rip through the TM1DisplayName column and create your unique key (i.e., as you had indicated Name + Date).

should have been:

Rip through the TM1DisplayName column and create your unique key (i.e., as you had indicated Name + Date) where IsTM1Key is false.

Good luck!

BigG · Post by **BigG** » Wed Jun 01, 2011 12:06 am

I have a feeling our next project of the blocks is heading towards a SCD scenario so this is good info. Just a question on the last post blackhawk

if you

Rip through the TM1DisplayName column and create your unique key (i.e., as you had indicated Name + Date) where IsTM1Key is false.

what would be the intention for comparison reporting across the new unique Tm1DisplayName and previous Tm1DisplayName/s ?

Also the hierarchy would be the latest, any previous level 1+ structures blown away? Or are there multiple hierarchies for each 'partition'?

jimshen · Post by **jimshen** » Wed Jun 01, 2011 6:53 am

How about this idea:

Create 2 cubes, one fact cube with partition dimension, the other lookup cube with partition, productID with string data product description.

While fact cube is reporting, the productID and PartisionID jointly populate product description in one spreadsheet report. Similar to relational database joints concept.

Hope it is simple solution.

Jim

ABRANDE · Post by **ABRANDE** » Wed Jun 01, 2011 9:17 pm

blackhawk - Thank you! Sounds great and less painful

, I do like that concept, as it is still easier for me to implement via sql then via ti scripting... I will try that in a few days...
Fortunately for us, we only have single parent hierarchies (multi-parents hierarchies are probably a bit more painful to implement, because you might need to deal with weights as well...) For now we are using fixed 2 level hierarchies. Though, I do unticipate users breaking from the fixed 2 level hierarchy concept and using the n-level/legal hierarchies in stead in the near future...

jimshen - Thank you for your suggestion... I will try that as well

BigG · Post by **BigG** » Wed Jun 01, 2011 11:44 pm

We aim to create a staging cube for the lowest level data with the partition dimension (weekly timestamp -1 -52 weeks for financial year).

Then create reporting cubes off this base information. (similar to jimshen suggestion)

Two compensate for SCD we have a compromise at rollup Hiererachy level to take a snapshot quartely for new rollup, while retaining the previous hierarchy as a alternate (max 4 rollups). For the leaf node items name changes we will deal with in aliases but 'new' items will be tagged to current hierarchy in correct rollup and alternate (previous quarters) place in dummy dumping rollup.

Future development (next phase) will incorparate a relational database to manage the base data and changes (allowing history past one year to be retained), TM1 will only be the multidimensional reporting source for standard analytical requirements (we use CX Report Studio). Any tricky trend reporting will be managed over the relational view (using CX Report Studio).

TM1 Forum

Cube Desing Question - partioned by date db model

Cube Desing Question - partioned by date db model

Re: Cube Desing Question - partioned by date db model

Re: Cube Desing Question - partioned by date db model

Re: Cube Desing Question - partioned by date db model

Re: Cube Desing Question - partioned by date db model

Re: Cube Desing Question - partioned by date db model

Re: Cube Desing Question - partioned by date db model

Re: Cube Desing Question - partioned by date db model

Re: Cube Desing Question - partioned by date db model

Re: Cube Desing Question - partioned by date db model

Re: Cube Desing Question - partioned by date db model

Re: Cube Desing Question - partioned by date db model

Re: Cube Desing Question - partioned by date db model

Re: Cube Desing Question - partioned by date db model

Re: Cube Desing Question - partioned by date db model

Re: Cube Desing Question - partioned by date db model

Re: Cube Desing Question - partioned by date db model

Re: Cube Desing Question - partioned by date db model