Calculation tiime weirdness

Post Reply
User avatar
Steve Rowe
Site Admin
Posts: 2416
Joined: Wed May 14, 2008 4:25 pm
OLAP Product: TM1
Version: TM1 v6,v7,v8,v9,v10,v11+PAW
Excel Version: Nearly all of them

Calculation tiime weirdness

Post by Steve Rowe »

There seems to be a bit of a theme recently of unexplainable behaviour in TM1 but then I suppose people don't post threads along the lines of "I logged in and everything happened just like I expected it to!"

So, I'm at a loss to explain some behaviour...

I have a workbook that I have built to "stress" the server, the intention is to distribute this to users and get them to run the workbook and tell me the time the workbook took to calculate. This all pretty straight forwad stuff.

The workbook pulls some ruled data out of the system, the rules are not particularly complex. A load of input ledger data gets remapped in various ways and the results are output in the workbook.

#Pretend rule!
['Result']=N: ['Input Data'] * ['Rule Work'];

From a rule save the workbook takes 19 minutes to calculate, too long but a different story...

Now because there is security on the system and I want everyone to work at the top of the entity structure, I've added a rule to the cube for the purpose of the stress test that is just.

['random']=N:['Result'] * rand;

this is fed with

['Result'] => ['random'];

This randomises the output of the system and I don't have to mess about with the security just for the stress test. (All users log in as the stress test user who has access to the Random slice only and All entities). Now in theory this is more work so you would think that the workbook takes longer to calculate, Wrong!

I've repeated this many times, starting from the system immediately after a rule save
If I look at 'result' in the workbook then I get a 19 minute calculation, then a 20 second refresh
If I look at 'random' in the workbook then I get 13 minute calculation, then a 14 second refresh.

Remember random is further down the calculation path than result.

TM1 Version 9.0 SP3 U9

I just can't think of a logical explanation for this.

A possibility is that rand evaluates to 0 often enough to knock 6 minutes off the calculation time, that would mean that rand evaluates to 0 around a 1/3 of the time. This also assumes that TM1 calculates x * 0 faster than x * y which I would have to be convinced of. Doesn't really hold

Another is that a significant number of Results are fed and evaluate to 0 which means that Random is not fed and so not calculated. This doesn't fit with my understanding of feeders though since the engine would have to evaluate Result before it could know if Random needed feeding.

It also kind of implies that you could over feed to your hearts content and then just add

['Fast Result']=N: ['Over fed Result'] *1;

to your system and see an immediate performance advantage.

If I change my random rule to

['random']=N:['Result'] * 1;

I do keep the performance gain which has confused me even more and random=result at all levels so the same amount of "work" is going on.

I've either found the quickest of quick fixes ever or have lost the plot entirely!

Anyone else have any views or comments or fancy testing?

Cheers,
Technical Director
www.infocat.co.uk
Marcus Scherer
Community Contributor
Posts: 126
Joined: Sun Jun 29, 2008 9:33 am
OLAP Product: TM1
Version: 10.2.2
Excel Version: 2016
Location: Karlsruhe

Re: Calculation tiime weirdness

Post by Marcus Scherer »

Just one remark on the procedure: You're talking about calculation "immediately after rule save". Shouldn't you restart the server to test rules properly?
User avatar
Steve Rowe
Site Admin
Posts: 2416
Joined: Wed May 14, 2008 4:25 pm
OLAP Product: TM1
Version: TM1 v6,v7,v8,v9,v10,v11+PAW
Excel Version: Nearly all of them

Re: Calculation tiime weirdness

Post by Steve Rowe »

Hi Marcus,

The cube concerned is self contained with no external references so a re-save of the rules should be enough to give me a clear starting point every time, though the feeders don't get cleared with a rule save. Since I'm not changing any data the number of fed cells should not be changing either so there should be no difference in the starting condition of the cube between a restart and a re-save.

I _think_ that only input numbers trigger feeders

If input numbers and N level _evaluated_ rules values trigger feeders then this may provide an explanation.

Something like this

Result is fed at the most efficient level possible but a significant number of Result cells evaluate to 0.
This means that the number of Random cells that need calculating are <<< than the number of Result cells and so you get a performance gain but only once Result has been calculated and triggered the onwards set of feeders.

This would suggest that how you write feeders is very important.

For example
['Result']=N: ['Input Data'] * ['Rule Work'];
['Random']=N:['Result'] * 1;

Feeders;
#If I write like this then no performance gain

['Input Data'] => ['Result'], ['Random'];

#If I write as two statements I could clear out over feeding from the system
['Input Data'] => ['Result'];
['Result']=>['Random'];

My understanding is that the two ways of writing the above feeders result in an identical number of fed cells and so should not generate better performance, because only input numbers trigger feeders. I think I need to test from a server restart though as this should prove or disprove the above statement….

Cheers,
Technical Director
www.infocat.co.uk
nhavis
Posts: 62
Joined: Mon Jan 05, 2009 12:47 am

Re: Calculation tiime weirdness

Post by nhavis »

Steve Rowe wrote:For example
['Result']=N: ['Input Data'] * ['Rule Work'];
['Random']=N:['Result'] * 1;

Feeders;
#If I write like this then no performance gain

['Input Data'] => ['Result'], ['Random'];

#If I write as two statements I could clear out over feeding from the system
['Input Data'] => ['Result'];
['Result']=>['Random'];
Steve Rowe wrote:My understanding is that the two ways of writing the above feeders result in an identical number of fed cells and so should not generate better performance, because only input numbers trigger feeders. I think I need to test from a server restart though as this should prove or disprove the above statement….
If only 'input numbers' could trigger feeders - and you had rules: A = B, B = C, how could you feed C?

With: ['Input Data'] => ['Result'], ['Random'];

Random only needs to be calculated when Result is non-zero.
If you feed from Input Data then sometimes its going to be calculated even when Result is zero.
User avatar
Steve Rowe
Site Admin
Posts: 2416
Joined: Wed May 14, 2008 4:25 pm
OLAP Product: TM1
Version: TM1 v6,v7,v8,v9,v10,v11+PAW
Excel Version: Nearly all of them

Re: Calculation tiime weirdness

Post by Steve Rowe »

Just thought I would update you all on this.

From a clean server restart Random takes 17 minutes to calculate and Result takes 19 minutes, since Random takes place after Result in the calculation chain there is still something a little strange happening.

The feeders in the cube are of this type

['Input Data'] => ['Result'];
['Result']=>['Random'];

Thoughts on what is causing this.

There are two (in theory) possible ways for feeders to work.

Input and Evaluated Rules trigger feeders

It's possible that I was wrong and that evaluated ruled cells do trigger feeders. I don’t think I am though….(see the next section)

The 2 minute difference is the additional time it takes to fire the ['Result']=>['Random'] feeder. I don't think this logic holds as if result has to be calculated before random can be calculated then the calculation time for random would have to be at least equal to or greater than the calculation time for random which is not true.

The only way I can see that we can get a shorter calculation time for Random if we have to calculate Result is to consider what I am timing is the calculation, consolidation and display of values. The only difference between Result and Random is that Result has some fed values that evaluate to 0, in Random all fed cells contain values. If fed zero values are treated as populated and hence slow the system down as the engine also then needs to calculate consolidations that evaluate to 0.

Traditionally people say that "overfeeding is bad because it makes the engine waste time calculating values that will be zero" it seems like this needs to be extended to include "and the engine then treats these fed zeros exactly like fed numbers for consolidation and display purposes further degrading the performance".

Whilst being obvious once you write it down this has really important implications as in most complex systems it is not possible to feed the end result perfectly. This means that the end results are usually subject to a degree of overfeeding. This suggests that putting a fed rule at the “end” of the system like

[‘Final Result’]=N: [‘Result’] *1;

[‘Result’]=> [‘Final Result’];

Would produce an immediate performance advantage as it clears the fed zeros from the reporting layer of the application, producing a reporting layer that is perfectly fed and hence more responsive. The level of the performance gain obviously depends on the level of overfeeding in the calculated result layer.

It also means that number of cells a ruled cell feeds effects the calculation time of the cell itself which will have an impact on design that would not normally be considered. To be honest I would have thought that I would have noticed this effect in nearly 10 years of building apps…

Only input values trigger feeders

I have some direct evidence that only input values trigger feeders and that is the information in the StatsByCube cube. If I set the performance monitor running then the “Number of fed cells” and “memory used for feeders” metrics do not change no matter how much calculation I perform. This implies that no new feeding happens when I calculate rules, irrespective of how the feeders are written.

This means that input numbers trigger feeders and that when the rules are compiled
['Input Data'] => ['Result'];
['Result']=>['Random'];

is the same as

['Input Data'] => ['Result'],['Random'];

This means that Random can be calculated without first calculating Result, which at least opens the possibility that Random can be calculated faster than Result. I can’t quite logically think of how this would work as I would now have fed zeros in both slices and cannot see where the performance gain comes from.

There is a half way house for the behaviour of feeders that would allow for all the results I am seeing. That is that the engine is somehow able to feed from ruled cells without (and before) actually calculating the cells. This would mean that number of fed cells do not change after start-up (assuming no new data) and that the “fed zeros” are dropped in the transition between Result and Random. This would mean that the engine is very clever, I can’t for a moment think how that would work without ending up with a fully evaluated system.

Anyway I’d better get on with my day job as my head is beginning to ache now. Would be great if someone could test the [‘Final Result’]=N: [‘Result’] *1; thing and see what the results are like.

Cheers
Technical Director
www.infocat.co.uk
User avatar
jim wood
Site Admin
Posts: 3951
Joined: Wed May 14, 2008 1:51 pm
OLAP Product: TM1
Version: PA 2.0.7
Excel Version: Office 365
Location: 37 East 18th Street New York
Contact:

Re: Calculation tiime weirdness

Post by jim wood »

Have you tried running the random first after a clean re-start?
Struggling through the quagmire of life to reach the other side of who knows where.
Shop at Amazon
Jimbo PC Builds on YouTube
OS: Mac OS 11 PA Version: 2.0.7
User avatar
Steve Rowe
Site Admin
Posts: 2416
Joined: Wed May 14, 2008 4:25 pm
OLAP Product: TM1
Version: TM1 v6,v7,v8,v9,v10,v11+PAW
Excel Version: Nearly all of them

Re: Calculation tiime weirdness

Post by Steve Rowe »

*cough*
Just thought I would update you all on this.

From a clean server restart Random takes 17 minutes to calculate and Result takes 19 minutes, since Random takes place after Result in the calculation chain there is still something a little strange happening.
:D

I think I have missed one vital piece of information out while talking about this.

I'm using Excel to view the data. The Excel speadsheets are a static layout and are created from non-zero suppressed slices (VIEW function removed). This means that there are plenty of zero cells in the spreadsheet. I think this increases the chance that the difference in time is down to the fed zero values that exists in Result but not in Random.

It probably means that you would only see the performance gain from a [‘Final Result’]=N: [‘Result’] *1; approach if you are looking at the data in an un suppressed way, but I have not tested this. I'm not going to have the time for anymore testing on this for a few days now.

Still it's all interesting stuff! Sort of....
Technical Director
www.infocat.co.uk
Post Reply