Page 1 of 1
Building dimensions and loading data-separate TI better?
Posted: Wed Dec 01, 2010 12:20 pm
by telula
Hello,
I remember reading somewhere on this forum that it is better to build dimensions before loading the cube and not as we are loading the cube. Is this the common understanding or did I remember incorrectly?
Would the situation be different if the source of your dimension (text file) is the same source as your data load?That is, can we build the dimension in the metadata tab and load in the Data tab since the source file is the same?
Re: Building dimensions and loading data-separate TI better?
Posted: Wed Dec 01, 2010 1:17 pm
by tomok
This may be just personal preference but I almost always separate metadata processes (dimension maintenance, attributes, etc.) from data loading processes. This is because I find there are many times I want to just load/reload a cube without changing dimensionality. If you create a kitchen-sink type TI process then you can't do this. It does lead to process proliferation but there are ways to help manage this with chores and/or nested processes.
Re: Building dimensions and loading data-separate TI better?
Posted: Wed Dec 01, 2010 4:23 pm
by ajain86
For the most part, I prefer to do dimension maintenance as a separate process prior to the data load.
I do, however, still have steps in the metadata for the load process to add any new members to avoid any errors during the load. This section has a check to first make sure if the element does not exist.
Re: Building dimensions and loading data-separate TI better?
Posted: Wed Dec 01, 2010 10:00 pm
by lotsaram
My opinion is also to always separate dimension updates from cube loads. (Generally the data source will be different anyway, being a full fact table or hierarchy extract for dimension updates versus summarised transactional data for cube loads.) Also making sure meta data is updated first saves significant time versus processing data load extracts twice.
ajain86 wrote:For the most part, I prefer to do dimension maintenance as a separate process prior to the data load.
I do, however, still have steps in the metadata for the load process to add any new members to avoid any errors during the load. This section has a check to first make sure if the element does not exist.
Personally I would only ever do this as a last resort, why process data twice when you don't have to. Whether this is desirable or even feasible depends on data load volumes, for small data sets such as GL sure it's no problem (but how often does a new cost centre or account get set up
and posted to in between dimension updates?), but if the data set is very large, say 10s of millions of records, and/or if the data loads is multi-threaded then any code on the meta data tab adds to much processing time overhead, or would cause other issues such as locking.
Re: Building dimensions and loading data-separate TI better?
Posted: Thu Dec 02, 2010 12:08 am
by Martin Ryan
I take a case by case basis. If, as lotsa mentions, your dimension source is different from your data source then it certainly makes sense to have them separately rather than doing messy joins to make them one data source.
If it's all in one place though I'd do it all in one process. This is mostly to cut down on the proliferation of TI processes. Lately I've seen a couple of relatively small TM1 models with 50+ TI processes and it makes it hard to find what you're looking for (which is why I wrote some VBA that's coming out in the new TM1 Tools edition very soon to help find things).
Technically speaking I don't think it really makes much difference. Sure you could do some error catching between two processes, but for me I'd rather have my model a little smaller instead.
Martin
Re: Building dimensions and loading data-separate TI better?
Posted: Thu Dec 02, 2010 2:44 pm
by ajain86
I agree with Martin in that it is a case by case basis.
In my limited experience, I have noticed that regular dimension maintenance has been of a lower priority and most dimension updates are actually caught during data loads. For this, I create an unmapped bucket in the dimension and add any new members there. This way, the new members can be easily identified and be properly mapped for the future. This also helps the person doing tieouts after the load as they no longer have to worry about any record not being loaded.
This does increase processing time but I have found it to be safer specially when there is little to none regular maintenance done by the customer.