Calculation tiime weirdness

Steve Rowe · Post by **Steve Rowe** » Tue Mar 24, 2009 5:22 pm

There seems to be a bit of a theme recently of unexplainable behaviour in TM1 but then I suppose people don't post threads along the lines of "I logged in and everything happened just like I expected it to!"

So, I'm at a loss to explain some behaviour...

I have a workbook that I have built to "stress" the server, the intention is to distribute this to users and get them to run the workbook and tell me the time the workbook took to calculate. This all pretty straight forwad stuff.

The workbook pulls some ruled data out of the system, the rules are not particularly complex. A load of input ledger data gets remapped in various ways and the results are output in the workbook.

#Pretend rule!
['Result']=N: ['Input Data'] * ['Rule Work'];

From a rule save the workbook takes 19 minutes to calculate, too long but a different story...

Now because there is security on the system and I want everyone to work at the top of the entity structure, I've added a rule to the cube for the purpose of the stress test that is just.

['random']=N:['Result'] * rand;

this is fed with

['Result'] => ['random'];

This randomises the output of the system and I don't have to mess about with the security just for the stress test. (All users log in as the stress test user who has access to the Random slice only and All entities). Now in theory this is more work so you would think that the workbook takes longer to calculate, Wrong!

I've repeated this many times, starting from the system immediately after a rule save
If I look at 'result' in the workbook then I get a 19 minute calculation, then a 20 second refresh
If I look at 'random' in the workbook then I get 13 minute calculation, then a 14 second refresh.

Remember random is further down the calculation path than result.

TM1 Version 9.0 SP3 U9

I just can't think of a logical explanation for this.

A possibility is that rand evaluates to 0 often enough to knock 6 minutes off the calculation time, that would mean that rand evaluates to 0 around a 1/3 of the time. This also assumes that TM1 calculates x * 0 faster than x * y which I would have to be convinced of. Doesn't really hold

Another is that a significant number of Results are fed and evaluate to 0 which means that Random is not fed and so not calculated. This doesn't fit with my understanding of feeders though since the engine would have to evaluate Result before it could know if Random needed feeding.

It also kind of implies that you could over feed to your hearts content and then just add

['Fast Result']=N: ['Over fed Result'] *1;

to your system and see an immediate performance advantage.

If I change my random rule to

['random']=N:['Result'] * 1;

I do keep the performance gain which has confused me even more and random=result at all levels so the same amount of "work" is going on.

I've either found the quickest of quick fixes ever or have lost the plot entirely!

Anyone else have any views or comments or fancy testing?

Cheers,

Marcus Scherer · Post by **Marcus Scherer** » Tue Mar 24, 2009 5:43 pm

Just one remark on the procedure: You're talking about calculation "immediately after rule save". Shouldn't you restart the server to test rules properly?

Steve Rowe · Post by **Steve Rowe** » Tue Mar 24, 2009 6:12 pm

Hi Marcus,

The cube concerned is self contained with no external references so a re-save of the rules should be enough to give me a clear starting point every time, though the feeders don't get cleared with a rule save. Since I'm not changing any data the number of fed cells should not be changing either so there should be no difference in the starting condition of the cube between a restart and a re-save.

I _think_ that only input numbers trigger feeders

If input numbers and N level _evaluated_ rules values trigger feeders then this may provide an explanation.

Something like this

Result is fed at the most efficient level possible but a significant number of Result cells evaluate to 0.
This means that the number of Random cells that need calculating are <<< than the number of Result cells and so you get a performance gain but only once Result has been calculated and triggered the onwards set of feeders.

This would suggest that how you write feeders is very important.

For example
['Result']=N: ['Input Data'] * ['Rule Work'];
['Random']=N:['Result'] * 1;

Feeders;
#If I write like this then no performance gain

['Input Data'] => ['Result'], ['Random'];

#If I write as two statements I could clear out over feeding from the system
['Input Data'] => ['Result'];
['Result']=>['Random'];

My understanding is that the two ways of writing the above feeders result in an identical number of fed cells and so should not generate better performance, because only input numbers trigger feeders. I think I need to test from a server restart though as this should prove or disprove the above statementâ€¦.

Cheers,

nhavis · Post by **nhavis** » Wed Mar 25, 2009 11:11 pm

Steve Rowe wrote:For example
['Result']=N: ['Input Data'] * ['Rule Work'];
['Random']=N:['Result'] * 1;

Feeders;
#If I write like this then no performance gain

['Input Data'] => ['Result'], ['Random'];

#If I write as two statements I could clear out over feeding from the system
['Input Data'] => ['Result'];
['Result']=>['Random'];

Steve Rowe wrote:My understanding is that the two ways of writing the above feeders result in an identical number of fed cells and so should not generate better performance, because only input numbers trigger feeders. I think I need to test from a server restart though as this should prove or disprove the above statementâ€¦.

If only 'input numbers' could trigger feeders - and you had rules: A = B, B = C, how could you feed C?

With: ['Input Data'] => ['Result'], ['Random'];

Random only needs to be calculated when Result is non-zero.
If you feed from Input Data then sometimes its going to be calculated even when Result is zero.

Steve Rowe · Post by **Steve Rowe** » Thu Mar 26, 2009 9:16 am

Just thought I would update you all on this.

From a clean server restart Random takes 17 minutes to calculate and Result takes 19 minutes, since Random takes place after Result in the calculation chain there is still something a little strange happening.

The feeders in the cube are of this type

['Input Data'] => ['Result'];
['Result']=>['Random'];

Thoughts on what is causing this.

There are two (in theory) possible ways for feeders to work.

Input and Evaluated Rules trigger feeders

It's possible that I was wrong and that evaluated ruled cells do trigger feeders. I donâ€™t think I am thoughâ€¦.(see the next section)

The 2 minute difference is the additional time it takes to fire the ['Result']=>['Random'] feeder. I don't think this logic holds as if result has to be calculated before random can be calculated then the calculation time for random would have to be at least equal to or greater than the calculation time for random which is not true.

The only way I can see that we can get a shorter calculation time for Random if we have to calculate Result is to consider what I am timing is the calculation, consolidation and display of values. The only difference between Result and Random is that Result has some fed values that evaluate to 0, in Random all fed cells contain values. If fed zero values are treated as populated and hence slow the system down as the engine also then needs to calculate consolidations that evaluate to 0.

Traditionally people say that "overfeeding is bad because it makes the engine waste time calculating values that will be zero" it seems like this needs to be extended to include "and the engine then treats these fed zeros exactly like fed numbers for consolidation and display purposes further degrading the performance".

Whilst being obvious once you write it down this has really important implications as in most complex systems it is not possible to feed the end result perfectly. This means that the end results are usually subject to a degree of overfeeding. This suggests that putting a fed rule at the â€œendâ€ of the system like

[â€˜Final Resultâ€™]=N: [â€˜Resultâ€™] *1;

[â€˜Resultâ€™]=> [â€˜Final Resultâ€™];

Would produce an immediate performance advantage as it clears the fed zeros from the reporting layer of the application, producing a reporting layer that is perfectly fed and hence more responsive. The level of the performance gain obviously depends on the level of overfeeding in the calculated result layer.

It also means that number of cells a ruled cell feeds effects the calculation time of the cell itself which will have an impact on design that would not normally be considered. To be honest I would have thought that I would have noticed this effect in nearly 10 years of building appsâ€¦

Only input values trigger feeders

I have some direct evidence that only input values trigger feeders and that is the information in the StatsByCube cube. If I set the performance monitor running then the â€œNumber of fed cellsâ€ and â€œmemory used for feedersâ€ metrics do not change no matter how much calculation I perform. This implies that no new feeding happens when I calculate rules, irrespective of how the feeders are written.

This means that input numbers trigger feeders and that when the rules are compiled
['Input Data'] => ['Result'];
['Result']=>['Random'];

is the same as

['Input Data'] => ['Result'],['Random'];

This means that Random can be calculated without first calculating Result, which at least opens the possibility that Random can be calculated faster than Result. I canâ€™t quite logically think of how this would work as I would now have fed zeros in both slices and cannot see where the performance gain comes from.

There is a half way house for the behaviour of feeders that would allow for all the results I am seeing. That is that the engine is somehow able to feed from ruled cells without (and before) actually calculating the cells. This would mean that number of fed cells do not change after start-up (assuming no new data) and that the â€œfed zerosâ€ are dropped in the transition between Result and Random. This would mean that the engine is very clever, I canâ€™t for a moment think how that would work without ending up with a fully evaluated system.

Anyway Iâ€™d better get on with my day job as my head is beginning to ache now. Would be great if someone could test the [â€˜Final Resultâ€™]=N: [â€˜Resultâ€™] *1; thing and see what the results are like.

Cheers

jim wood · Post by **jim wood** » Thu Mar 26, 2009 2:36 pm

Have you tried running the random first after a clean re-start?

Steve Rowe · Post by **Steve Rowe** » Fri Mar 27, 2009 7:29 am

*cough*

Just thought I would update you all on this.

From a clean server restart Random takes 17 minutes to calculate and Result takes 19 minutes, since Random takes place after Result in the calculation chain there is still something a little strange happening.

I think I have missed one vital piece of information out while talking about this.

I'm using Excel to view the data. The Excel speadsheets are a static layout and are created from non-zero suppressed slices (VIEW function removed). This means that there are plenty of zero cells in the spreadsheet. I think this increases the chance that the difference in time is down to the fed zero values that exists in Result but not in Random.

It probably means that you would only see the performance gain from a [â€˜Final Resultâ€™]=N: [â€˜Resultâ€™] *1; approach if you are looking at the data in an un suppressed way, but I have not tested this. I'm not going to have the time for anymore testing on this for a few days now.

Still it's all interesting stuff! Sort of....

TM1 Forum

Calculation tiime weirdness

Calculation tiime weirdness

Re: Calculation tiime weirdness

Re: Calculation tiime weirdness

Re: Calculation tiime weirdness

Re: Calculation tiime weirdness

Re: Calculation tiime weirdness

Re: Calculation tiime weirdness