Page 1 of 1
Correlation between "Commit" times and the Transaction Log
Posted: Mon Jan 09, 2012 11:17 pm
by bradohare
Hello,
As I would imagine many of you know, when a query is completing referencing cells previously not cached, TM1 "commits" these cells at the conclusion of the query (what is displayed if you are watching TM1Top). During which TM1 becomes single threaded, or in other words, no one can do anything in TM1 until this commit completes.
What I am curious about, is what exactly is going on during this "commit" process. We've noticed that there is a direct correlation between the length of time the "commit" runs for and the transaction log size. So, if our transaction log is letssay 30 MB, the "commit" would take 3 seconds. However if the transaction log was 1 GB that same query would take 20 seconds to commit (don't quote me on the exact ratio, the increase is just very noticable).
Any thoughts?
Thanks,
Brad
Re: Correlation between "Commit" times and the Transaction L
Posted: Tue Jan 10, 2012 12:45 am
by tomok
There is no correlation between the size of a log file and anything, except for how long it may take to do a system save. You could have a large log file because you have been busy making changes to a cube or simply because it's been a week or more since you did a save. Also a "query" does nothing to force a commit. A commit is what happens when you WRITE to a cube, not READ from it.
That being said, if you have a 30MB log file then something is wrong. You are probably logging TI processes which is likely unecessary, depending on your setup. Turning that off in your processes will speed them up immensely.
Re: Correlation between "Commit" times and the Transaction L
Posted: Tue Jan 10, 2012 8:44 am
by Duncan P
I think that perhaps this is about the commit of calculated results to the shared calculation cache structure. Are you sure that the blocking is of the whole server and not just the cube being queried. Also what version are you on?
Re: Correlation between "Commit" times and the Transaction L
Posted: Tue Jan 10, 2012 5:06 pm
by bradohare
Tomok,
I don't believe you correct with regard to the "Commit" and WRITES and READS. We've been watching TM1Top closely for months and while the "Commits" occur at the conclusion of a WRITE, they also most certainly occur at the end of a READ query. I think Duncan P's hypothesis with regard to the cache structure is correct.
With the regard to the transaction log size, 30 MB for us is nothing. This is an extermly large daily/highly available model with large amounts of data being pulled in every couple of mintues, and then that data needs to get replicated across multiple cubes 6x (long story). We also have the need for our main instance to sync with an input instance every couple mintues.
We've found that if we perform SaveDataAll with the sync running the instance will hang/deadlock, therefore that is not an option. Because of this we have intentionally left cube logging on. So, if there is a server crash or reboot when the instance comes up there won't lost data because it will read from the transaction log. We are currently working on a solution to pause our sync so we can our SaveDataAlls. The environment of the application is extremly locked down and we are not permitted to make any "changes" to the application. Ideally, we would just turn-off the sync chore and perform the SaveDataAll, however this is considered a change so we cannot do it.
With that said, as I mentioned previously this is a highly available system (23 hours 7 days a week) and the loss of data would result in us having to reload 4 years of historical daily data. That would result in an outage of 4 - 5 hours which simply not acceptable for our client. In this situation, the saftey net of a transaction log is a necessary evil.
Duncan,
Yea, I'm sure it locks up the entire server. For example, if it starts to perform the "Commit" you can't even naviagte the instance in Perspectives. In top you can even see the the Excel Add-in verifying it's connectivity to server waiting for it to complete. Personally, prior to this engagement I wasn't even aware of the nature of a "Commit" occuring because the models were smaller and less calcuated. The Commits would occur however they would happen very quickly and wouldn't be noticed. Also, I've never had to watch top this closely previously. We are on 9.5.1 currently.
Thanks,
Brad
Re: Correlation between "Commit" times and the Transaction L
Posted: Wed Jan 11, 2012 10:22 am
by qml
Brad, I can confirm your observations from my experience.
With some big models you can definitely notice "Commit" calls in TM1Top when data retrieval (e.g. user refreshing a report) thread finishes. And I agree with you that these Commits put a lock on the server and no other thread can continue to run until it's finished.
It would be nice if anyone could shed some more light on the subject, but I've always assumed it to be doing pretty much what you said.
Re: Correlation between "Commit" times and the Transaction L
Posted: Wed Jan 11, 2012 1:06 pm
by lotsaram
bradohare wrote:We've found that if we perform SaveDataAll with the sync running the instance will hang/deadlock, therefore that is not an option. Because of this we have intentionally left cube logging on. So, if there is a server crash or reboot when the instance comes up there won't lost data because it will read from the transaction log. We are currently working on a solution to pause our sync so we can our SaveDataAlls. The environment of the application is extremly locked down and we are not permitted to make any "changes" to the application. Ideally, we would just turn-off the sync chore and perform the SaveDataAll, however this is considered a change so we cannot do it.
What is it about a certain type of IT management that can so perfectly allow the precise lobotomization of the common sense lobe? I feel for you, sounds like you have a reasonably large and complex model and these sorts of operational rules only serve to make everyone's lives more difficult and makes systems less agile and changes much slower and more expensive to implement. To use a simple car analogy, you have a steering wheel but you aren't being allowed to use it. Rather you have to design and deploy a self-steering system instead! Far from reducing operational risk this lack of common sense between what is a true technical change, configuration change versus changing an operational parameter leads to sub-optimally performing systems, unhappy customers and frustrated administrators and developers.
bradohare wrote:Tomok,
I don't believe you correct with regard to the "Commit" and WRITES and READS. We've been watching TM1Top closely for months and while the "Commits" occur at the conclusion of a WRITE, they also most certainly occur at the end of a READ query. I think Duncan P's hypothesis with regard to the cache structure is correct.
I agree with this assessment as well. On completion of view retrieval/calculation any new calculated values get written to the calculation cache and this commit action does cause a lock (although typically only for a specific cube(s) as opposed to server wide unless a meta data change is also involved such as reevaluation of a dynamic subset.) 3 potential suggested actions which may help
1/ make sure that the views being queried in the read threads causing the locks aren't using any dynamic subsets. If so replace with static subsets. (Not sure of the "change" impact on your system here!)
2/ It is possible to effectively disable the calculation cache by using the setting InfiniteThresholdForSingleCellStorage=T in tm1s.cfg. As the calculation cache isn't written to there is no commit at the end of a calculation activity and hence no lock (dynamic subsets could still be an issue though). There a a trade-off in performance though as no cache means all view retrieval is calculated from scratch each time. However in a model with frequent updates this may not be much of an impact. The change is simple and just requires a restart to take effect but in your environment would certainly be seen as a change ...
3/ Upgrade to 9.5.2 with parallel interaction (probably a long shot in the environment you have described.) Server wide locking is a thing of the past and locks only occur due to meta data changes.
Re: Correlation between "Commit" times and the Transaction L
Posted: Wed Jan 11, 2012 1:53 pm
by jim wood
InfiniteThresholdForSingleCellStorage=T
It's a shame you can't set this for individual cubes rather than for the whole service. In a perfect world you would then be able to leave this on for summary cubes that are hit more often.
Re: Correlation between "Commit" times and the Transaction L
Posted: Thu Jan 12, 2012 5:51 pm
by bradohare
Hi lotsaram,
With regard to your thoughts on IT management, as you would guess, I completely agree with you. However, this is the hand that was dealt and it is what it is. I'm most certainly not going change anything and it's something we need to live around.
With that said, this goes back to my main question and the point of my confusion. Why would the trans log have anything to do with the time a commit takes?? Just doesn't make sense to me.
To each of your points:
1 - Knowing the potential issue with dynamic subsets we avoid them like the plague, making almost all of our subsets static
2 - Thanks for this, I wasn't aware of this setting. Unfortunaly, we cannot use it. We have a handful of cubes in our instance, one is a "calculated" cube that is performant and reflects data as of the previous end of day and the other is a "live" cube pulling in data every couple min. We need to have the caching on for the calcuated cube. Jim's point is good as it would be great to disable this caching for an individual cube.
3 - In the cards and definatly on the horizion. Though, our model is massive and we are already maxing out the memory on our boxes. Trying to get more memory has been a chore and we don't have a clear idea how much more we would need for 9.5.2. IBM says 10 - 30% but I don't buy it, I think it'll be more.
thanks
brad
Re: Correlation between "Commit" times and the Transaction L
Posted: Thu Jan 12, 2012 6:08 pm
by Mike Cowie
Hi Brad:
With that said, this goes back to my main question and the point of my confusion. Why would the trans log have anything to do with the time a commit takes?? Just doesn't make sense to me.
Sorry to jump in a bit late, but what platform are you running your TM1 Server(s) on? I think we've (to this point somewhat anecdotally) seen some odd performance issues that seem to correlate with larger transaction logs at one of our customers that's running TM1 on a Unix OS. I'd have to dig around a bit to find out more on that, but wanted to mention it in case that was true in your case, too - if you're running TM1 on a Windows OS then I can't say I've seen the same correlation (and we have way more customers running TM1 on Windows servers).
Regards,
Mike
Re: Correlation between "Commit" times and the Transaction L
Posted: Thu Jan 12, 2012 6:14 pm
by lotsaram
bradohare wrote:Why would the trans log have anything to do with the time a commit takes?? Just doesn't make sense to me.
It has changed a bit from version to version but from 9.4 changes to the tm1s.log file are also held in RAM (I guess from a design perspective this makes rollback easier...) so if you have a large number of transaction log entries to write to disk then this will cause longer commit times.
WRT 9.5.2 and PI how much additional memory might be needed is really going to depend on concurrency and use probably more than anything else. If you have very large numbers of users hammering away doing top-down spreading data entry in a large model then the increase might be huge. If number of concurrent inputs is small and the scope of the data entry itself is small and fast then it might be imperceptible. In our case we do have quite high data entry concurrency but the memory increase has been nil. (Due to some technical issues with persistent feeders the planning server more or less always does a full load and the garbage memory left over from the multi-threaded feeder processing provides a buffer of memory for TM1 to use such that if PI does actually need more memory we don't observe it.) Of course this scenario may or may not be relevant to you. But at the end memory and hardware cost are actually very small overall cost components compared to other operational costs. If you need more memory do the business case and procure it!
Re: Correlation between "Commit" times and the Transaction L
Posted: Fri Jan 13, 2012 6:31 am
by Michel Zijlema
lotsaram wrote:bradohare wrote:Why would the trans log have anything to do with the time a commit takes?? Just doesn't make sense to me.
It has changed a bit from version to version but from 9.4 changes to the tm1s.log file are also held in RAM (I guess from a design perspective this makes rollback easier...) so if you have a large number of transaction log entries to write to disk then this will cause longer commit times.
This is interesting. If this is the case I would assume that the tm1s.log is both in RAM and directly written to disk, otherwise the recovery function of the log would be gone. But if that is the case there would be no need to write the transaction log to disk as it's already written...
Michel
Re: Correlation between "Commit" times and the Transaction L
Posted: Fri Jan 13, 2012 11:17 am
by Steve Rowe
Just to refocus this very interesting debate....
I think the OP is asking the following.
The calculation engine calculates some results that are derived from previously loaded data inputs. (Consolidations or rule derived is not specified).
When the calculation run is over a commit event is observed in TM1Top.
It is assumed that the commit event is refering to the calculated results being written to the calculation cache in RAM.
However the length of the commit seems to increase in proportion to the tm1 change log on the disc. (tm1s.log).
_____________
My view
What might be happening......................
1. OP is running out of RAM on their box.
2. lotsaram says log file is also held in RAM, the bigger the log file the more RAM that is used.
3. A large commit of derived numbers takes place.
4. RAM is used up so system flips into disc memory.
5. Grinds to a halt.
Obviously I'm just guessing but it seems to fit what we know?
Would be great if BRad would post a few details about his infrastructure so we have a bit more to go on TM1 version etc
Cheers,
Re: Correlation between "Commit" times and the Transaction L
Posted: Fri Jan 13, 2012 12:31 pm
by Duncan P
Re: Correlation between "Commit" times and the Transaction L
Posted: Fri Jan 13, 2012 1:28 pm
by lotsaram
Steve Rowe wrote:2. lotsaram says log file is also held in RAM, the bigger the log file the more RAM that is used.
I think this is a little misconstrued. For TI processes the batch of new transactions to be written to the log file on commit of the process is held in RAM and then written to the log file in one go. There must be some upper limit to the size of a batch held in memory before an intermediate commit to disk or else this could chew excessive memory, but I have no idea what exactly that limit is (this safety net wasn't there in 9.4.1 and a long process with logging on would reliably crash a 32 bit server due to out of memory).