Page 1 of 1

"To split or not to split the cube- that is the question..."

Posted: Mon Sep 10, 2012 5:12 am
by fleaster
Hi all,
I am currently (re)building a finance application in TM1, but have some misgivings on my cube design /architecture. In a nutshell, there are 3 major functions we need to perform with our main GL cube:

-(A)ctuals Reporting – sourced from flatfiles via TI. In addition there are simple Rules to calc FX, Balance Sheet consolidations etc
-(R)egulatory Reporting – generated from a combination of (A)ctuals data and external cubes via Rules (e.g. mapping & domicile % splits)
-(F)orecasting – generated from a combination of user input and external cubes via Rules eg mapping & product allocation % splits). Will also preload from Actuals data on occaision.

Notes:
-all functions involve almost identical dimensionality ( (A) and (F) are the same, and (R) may use 1-2 extra dimensions that split Actuals data into additional groupings)
-(R) and (F) involve complex rule calculations (incl. conditional feeders) and large amounts of data

In terms of architecture, the options I have considered are:

Option 1
Use 1 GL cube for performing both user display and rule calculations

Option 2
Create 3 GL cubes for each function, each with their own user display and rule calculations. Dimensionality will be identical in each, and functions that are the same (e.g. FX calc) will be repeated in each cube.

Option 3
Retain the main GL cube, and create 2 temporary working cubes that take care of the rule calcs for (R) and (F). Feed the frozen data back into the main GL cube via TI for user display. + For forecasting, the users may use the temp working cube for input until the forecast is finalized.

…my original prototype was built using Option 1, but am now reconsidering after hitting some performance issues :(

Option 4
???

What I am interested to know is what experience you have had with the above options, what you have found to be the advantages/disadvantages, and any other advice you can think of... :)

Thanks!

Matt

Re: "To split or not to split the cube- that is the question

Posted: Mon Sep 10, 2012 1:44 pm
by mattgoff
fleaster wrote:…my original prototype was built using Option 1, but am now reconsidering after hitting some performance issues :(
I'd stick with Option 1, from what you've described it sounds like the correctly normalized design-- splitting things up will not improve performance. I'd take a closer look at your rules/feeders to see if you can improve performance. If they are not fixable, you could consider TI instead and refresh periodically.

Matt

Re: "To split or not to split the cube- that is the question

Posted: Mon Sep 10, 2012 2:27 pm
by tomok
I'm of the opinion you should never mix a live budgeting/forecasting system with an "Actuals" GL reporting one. I think it opens up too many possibilities for problems and I don't like it from a Sarbanes-Oxley perspective (forget this if you are not US-based). I would separate the budgeting/forecasting into a separate cube and just move the "final" versions over to the reporting cube when they are signed-off on.

Re: "To split or not to split the cube- that is the question

Posted: Mon Sep 10, 2012 2:32 pm
by David Usherwood
@tomok, you surprise me. What about using security to hide the unfinished budgets from the users?
My take is that budgets and reforecasts are typically less granular than actuals so don't fit very well. Same outcome I guess :)

Re: "To split or not to split the cube- that is the question

Posted: Mon Sep 10, 2012 2:40 pm
by tomok
David Usherwood wrote:@tomok, you surprise me. What about using security to hide the unfinished budgets from the users?
That's certainly an option but I have an affinity for keeping things "clean". I don't like to introduce security into a situation unless it is necessary. Also don't like the idea of writing into what should be a read-only (i.e., reporting) environment. I know IBM has addressed this with new features but I still shy away from it. When it comes to model design I like to keep it as clean as possible and compartmentalized. I think it's easier to understand and maintain that way.

Re: "To split or not to split the cube- that is the question

Posted: Mon Sep 10, 2012 3:38 pm
by mattgoff
tomok wrote:
David Usherwood wrote:@tomok, you surprise me. What about using security to hide the unfinished budgets from the users?
That's certainly an option but I have an affinity for keeping things "clean". I don't like to introduce security into a situation unless it is necessary. Also don't like the idea of writing into what should be a read-only (i.e., reporting) environment. I know IBM has addressed this with new features but I still shy away from it. When it comes to model design I like to keep it as clean as possible and compartmentalized. I think it's easier to understand and maintain that way.
What you call "clean" I'd call un-normalized. :D

I handle it with a mix of security and rules, and I think it works well. We only have security on two dimensions: Scenario and Dept.
  • Actuals come in from the Oracle GL via TI to an "Actuals" scenario which is read-only.
  • Forecast is our only writable scenario and has a copy of actuals for closed periods (closed vs open set in a control cube). This is achieved with a rule, so de facto closed periods are read only w/o the need for period security.
  • Budget is read-only. We actually plan budget in the above (Forecast) scenario. Once it's approved, I copy to Budget.
  • Snapshots are read-only. I make a static copy of Forecast once a month after the internal submission deadline and all central, forecast-based reporting uses this instead of Forecast (so that scenario is immediately available for adjustments for the next forecast). We save snapshots indefinitely so we can track how forecasts changed over time.
This keeps things very simple for the users: Forecast is the only writable scenario, anything in budget is the final/approved budget (if it's still blank for a year, there's no approved budget yet), anything in a Snapshot is the "official" forecast we submitted.

Matt

Re: "To split or not to split the cube- that is the question

Posted: Tue Sep 11, 2012 2:10 am
by fleaster
Hi all, thanks for the responses - some interesting opinions on offer :)
mattgoff wrote: This keeps things very simple for the users: Forecast is the only writable scenario, anything in budget is the final/approved budget (if it's still blank for a year, there's no approved budget yet), anything in a Snapshot is the "official" forecast we submitted.
Actually this is the way we run our current forecasts, however we tend to use TI rather than rules - so the current rebuild is to move towards live rule calculations to perform the workings instead.

So my followup question is - we know TM1 can handle large amts of (static) data well - however, is there such a thing as "having too many rules in the one cube"...?

...am wondering if performance/memory will be degraded (particularly with several conditional feeders) at some point, regardless of how well the rules/feeders are designed...

Re: "To split or not to split the cube- that is the question

Posted: Tue Sep 11, 2012 4:01 pm
by mattgoff
fleaster wrote:So my followup question is - we know TM1 can handle large amts of (static) data well - however, is there such a thing as "having too many rules in the one cube"...?

...am wondering if performance/memory will be degraded (particularly with several conditional feeders) at some point, regardless of how well the rules/feeders are designed...
Of course. :geek:

Re: "To split or not to split the cube- that is the question

Posted: Wed Dec 19, 2012 11:40 pm
by fleaster
...just thought I'd tack on an additional thought - to reiterate, I am using one cube for a few different reporting purposes (e.g. Actuals/Budget/Regulatory) - however, the dimensionality for each is very similar just using different structures... so my question is:

Is it good or bad practice to split the cube into different reporting cubes based on size?

e.g. for 5 yrs worth of data:
-One big cube = 3 GB
-Separate Actual/Budget cube = 1 GB
-Separate Regulatory cube = 2 GB

...obviously the rules/drills would have to be replicated into the separate cube + may need some cube-to-cube drills... but do you think splitting the cubes would improve performance materially or not? what other considerations would need to be factored in...?

Re: "To split or not to split the cube- that is the question

Posted: Thu Dec 20, 2012 9:12 am
by garry cook
I'd consider cache invalidation on data uploads in your design - segregation into different cubes will have an impact on this as if the data's all in one cube, it will wipe your caches when anything gets updated. At least in a split design, depending on your cube dependencies, your Actual/Reg data should retain the speed improvement of caching when forecasts change (which I'm assuming is far more regular than Act/Regs). In one cube, these would be lost whenever forecasts change.

May be insignficant in your model but worth bearing in mind if the cube is 3Gb or so.

Regards

Re: "To split or not to split the cube- that is the question

Posted: Mon Dec 24, 2012 1:47 am
by fleaster
thanks for the tip - would you be able to expand on what "cache invalidation" is and how it works?

e.g. are you referring to things like Parallel Interaction...?

cheers,

Matt

Re: "To split or not to split the cube- that is the question

Posted: Mon Dec 24, 2012 3:53 am
by Alan Kirk
fleaster wrote:thanks for the tip - would you be able to expand on what "cache invalidation" is and how it works?

e.g. are you referring to things like Parallel Interaction...?
I'd leave that for Garry to answer but he probably has more sense than me and isn't working on a budget model on Christmas Eve.

It's nothing quite so glamorous; just a reference to the caching of calculated numbers. As you probably know TM1 evaluates calculated cells only when a user queries them. However once it's done that, it doesn't need to do it again because it keeps the result in memory; that is, the results are cached. If someone else (or the same user) queries the same numbers, they don't have to be recalculated because they're already in memory which improves the read time.

That is, until someone changes any of the underlying values which of course invalidates the cache. If you want to see this in action, create a simple cube with a few consolidated values and turn on Performance Monitor, calculate some consolidated values, wait 1 minute for them to show up in the }StatsByCube cube, and check out the Number of Stored Calculated Cells metric. You should see some value in there. Then change some of the underlying numbers, wait another minute, and check the results again. The results will probably be back to zero because there are no longer any valid calculations. If the same values are queried again, TM1 will need to recalculate them.

If you split your values into different cubes, then only the ones that are actively updated will have their caches invalidated. The ones that are only updated periodically (say, forecast or budget) will retain their cached numbers for longer giving you a performance advantage over having them all in one cube. I think Garry's point was (and if his wasn't, mine is) that if you're talking about a seriously big cube the performance price that you can pay through the cache being invalidated every time you load from a source system (as you probably do with Actuals regularly) can be significant. With a relatively small cube it may not be significant at all. However there are always swings and roundabouts with these things; generally speaking, for instance, it would be easier for an end user to create ad hoc views in a single cube rather than from multiple views (something which you can't do at all in Cube Viewer). You just have to work the balance between usability and performance.

Re: "To split or not to split the cube- that is the question

Posted: Wed Dec 26, 2012 11:04 pm
by fleaster
Alan Kirk wrote:I'd leave that for Garry to answer but he probably has more sense than me and isn't working on a budget model on Christmas Eve.
lol better than working on Boxing Day night! :p ...thanks for responding though :)

...OK so my strategy so far has been to have one big multi-purpose financials cube (16 dimensions) to maximize useability/flexibility of analysis, but I have also tried to minimize rule calculations and use static values where possible.

As such the performance is ok, but of course the tradeoff is the size of the cube has expanded exponentially. Assuming we have enough RAM to cover the size, my question is are there any other considerations/drawbacks to this approach?

e.g.
-is it like "putting all your eggs in one basket"? :p
-any long term limitations like maximum cube size?

cheers!

Matt