Page 1 of 1

Performance Use of Multiple CPU's

Posted: Thu Jun 25, 2009 5:27 am
by damientaylorcreata
Hi Guys,

Our TM1 server currently has 8 CPU's! However I have taken the time today to see how well the load of running TM1 is spread across all 8 CPU's and I was suprised to see that no matter how much load I place on TM1 (running chores, opening views, etc), all of the activity was concentrated on one CPU only! I remember seeing something about a setting that allows you to specify how many CPU's are available and Tm1 can then spread the load, however I am unable to find this again now.

Could you somebody please let me know how to spread the load between all available CPU's?

Thanks,
Damien

Re: Performance Use of Multiple CPU's

Posted: Thu Jun 25, 2009 5:35 am
by Alan Kirk
damientaylorcreata wrote:Hi Guys,

Our TM1 server currently has 8 CPU's! However I have taken the time today to see how well the load of running TM1 is spread across all 8 CPU's and I was suprised to see that no matter how much load I place on TM1 (running chores, opening views, etc), all of the activity was concentrated on one CPU only! I remember seeing something about a setting that allows you to specify how many CPU's are available and Tm1 can then spread the load, however I am unable to find this again now.

Could you somebody please let me know how to spread the load between all available CPU's?
You can't. When it comes to parallel processing, TM1's gonna do what it's gonna do, and that's not very much. (Edit: See also, this thread: http://forums.olapforums.com/viewtopic. ... 885&p=4990.)

What you're remembering is the MaximumCubeLoadThreads parameter from the tm1s.cfg file:
Specifies whether the cube load and feeder calculation phases of server loading are multi‐threaded, so multiple processors can be used in parallel.

This results in decreased server load times.

To run in multi‐threaded mode, you should set MaximumCubeLoadThreads to the number of processors on the TM1 server that you want to dedicate to cube loading and feeder processing.

Generally, the best performance is achieved when the parameter is set to a value equal to (number off available processors) - 1. For example, if the TM1 server is running on a four‐processor machine, MaximumCubeLoadThreads should be set to 3. This ensures that one processor is available to run other applications while the TM1 server is loading.

When MaximumCubeLoadThreads is set to 0, cube loading and feeder processing is NOT multi‐threaded. This is the default behavior when MaximumCubeLoadThreads is not explicitly set in the Tm1s.cfg file.

NOTE: When MaximumCubeLoadThreads is enabled, TM1 cannot manage the order in which feeders are calculated. There may be cases where processing order has an adverse effect on your application due to some order‐of‐evaluation dependencies in the multi‐threaded environment. If your TM1 model uses conditional feeders where the condition clause contains a fed value, you should set MaximumCubeLoadThreads=0 or exclude the parameter from the Tm1s.cfg file to disable the use of multiple threads at load time.
However this only affects loading. It doesn't have any effect once that loading is done.

Re: Performance Use of Multiple CPU's

Posted: Thu Jun 25, 2009 6:13 am
by damientaylorcreata
Thanks for such a fast reply!!! Your post was helpful. I now have a better understanding of how TM1 handles / or does not handle multi threading. Therefore for us it would better for us to have one fast processor than to have 8 slower ones.

Re: Performance Use of Multiple CPU's

Posted: Thu Jun 25, 2009 7:16 am
by Martin Erlmoser
if you have 8 users which open a bigger view, start a process .. you should have 100% cpu usage and not 12.5%
so it makes sense using more than one core

one thread gets at most one core..

Re: Performance Use of Multiple CPU's

Posted: Thu Jun 25, 2009 11:58 am
by lotsaram
Each thread can utilize only one core, how are you generating the load? If you are on 9.0 then if you are loading with TI the processes will just queue, however for user read requests 9.0 should still multithread quite effectively.

Re: Performance Use of Multiple CPU's

Posted: Thu Jun 25, 2009 2:05 pm
by LoadzaGrunt
GRUNT IS GOOD

Re: Performance Use of Multiple CPU's

Posted: Thu Jun 25, 2009 2:35 pm
by Steve Rowe
Ahh, but could you define Grunt? :P

Re: Performance Use of Multiple CPU's

Posted: Thu Jun 25, 2009 9:08 pm
by rollo19
I would love to hear thoughts on what sort of GRUNT you would want for a reasonably complex budgeting model (HR, CAPEX, P&L, B/S/C/F) for 100 users - I would hate to blow it up/have slow response times just as everyone hits F9 Friday 4PM

Starting with:
> one scalable (virtualised) 64 bit Win 2003 server for the TM1 server (ram as reqd) - how much 'grunt' could\should I throw at this?
> another scalable 32 or 64?? bit Win 03 server for TM1Web - understand this needs to fire up an instance of the Excel service for each client so.. how much 'grunt' could\should I throw at this?
(dual server configuration suggested in this thread by Brian Barnes ex Applix http://www.mombu.com/programming/object ... 60543.html)

Server replication is considered to be a needless pain in the .. and I would like to avoid it.

Windows Server 2003 Standard Edition has "up to 4-way symmetric multiprocessing (SMP) support" - which according to some threads suggests you can have 4 CPU's each quad core.. so up to 16 cores. But as noted in this thread TM1 is not too good at multi-core threading.. http://forums.olapforums.com/viewtopic. ... 885&p=4990

Cheers

Re: Performance Use of Multiple CPU's

Posted: Thu Jun 25, 2009 9:22 pm
by Alan Kirk
rollo19 wrote:I would love to hear thoughts on what sort of GRUNT you would want for a reasonably complex budgeting model (HR, CAPEX, P&L, B/S/C/F) for 100 users - I would hate to blow it up/have slow response times just as everyone hits F9 Friday 4PM
I think that there is a slight misconception here... "everyone hits [F9]" is not where problems arise.

"Person 1 enters a value, person 2 reads a consolidation which is dependent on that value, person 1 enters another value, person 3 hits [F9]..." THAT'S where you start to get slow response times. If everyone's hitting [F9] and reading off pre-calculated figures the response time will be fine. When the figures are changing and the outputs constantly need to be recalculated (especially if rules are involved) then things start getting stuck in a queue.
rollo19 wrote: Starting with:
> one scalable (virtualised) 64 bit Win 2003 server for the TM1 server (ram as reqd) - how much 'grunt' could\should I throw at this?
> another scalable 32 or 64?? bit
This is a "how long is a piece of string" question since it's going to depend very much on the design of your model. But in relation to those question marks it's worth bearing in mind that 64 bit doesn't make anything faster, it just increases the amount of data that you can store in the model before you start bouncing on the memory ceiling of a single application.
rollo19 wrote: Win 03 server for TM1Web - understand this needs to fire up an instance of the Excel service for each client so..
I'd love to know where you got that understanding from, 'cos it just ain't so. I have 2 users on Web at the moment, but just the one Excel service. And that's the way I'd expect it to be.

Re: Performance Use of Multiple CPU's

Posted: Thu Jun 25, 2009 9:52 pm
by rollo19
Thanks Alan - are you running on TM1? that was a smokn fast response

Yes what I was trying to say is if changes are being submitted at a driver level e.g. headcount, there are rules being run (not just hierarchy consolidations) to calculate P&L lines and B/S impact etc. So if 50+ users start hammering in numbers which require a re-calc... at the same time..

Re 'how much grunt' for the TM1 server, I appreciate it is a 'who knows' without having built it .. Perhaps I could re-phrase that - how much processing Grunt could I throw at/budget for this if needed? If multi-core threading is an issue, is 4 quad cores a waste of time?

Yeap appreciate the differences in 64/32 .. just wondering if TM1Web chews much RAM with 100 users.. and Excel is a 32 bit app then 32 bit win for TM1Web would be fine. Just wondering if I might need any 'grunt' for that. 64 bit for the TM1 server to scale for huge data volumes of data - which may be needed here for analysis.

Re multi-instances of Excel - perhaps I misunderstood this thread? http://www.mombu.com/programming/object ... 60543.html

Re: Performance Use of Multiple CPU's

Posted: Thu Jun 25, 2009 10:43 pm
by rollo19
.. found Intel's Nehalem quad core processors allow you to turn off a couple of the cores and over clock the remaining cores. Could this be a beneficial configuration for TM1 server?

Re: Performance Use of Multiple CPU's

Posted: Fri Jun 26, 2009 12:40 am
by rollo19
So if I can summarise what I gather from these discussions - TM1 is scalable in RAM to almost any data vol however is not too good with multi-core processing. Calculation requests (people submitting numbers to a server which require extensive rule calcs) tend to get queued on a single core meaning possible delays and locking issues at busy times. Mitigating steps:
• Separate TM1Web and TM1 servers
• Optmisised model design, use pre-calculated results (TI) and heirachy consolidations where possible.
• Solid network connection to minimise client server and server server response times and optimised/minimal TM1web form design.
• An over clocked top end processor (e.g Nehalem chip with two cores off allowing the other two to run faster/hotter)
If user numbers are getting up there (>50) and performance is an issue, and individual slice sizes can be kept < ~1.5M cells - use Planning Contributor.

Re: Performance Use of Multiple CPU's

Posted: Fri Jun 26, 2009 1:15 am
by rollo19
TM1 server planning, sizing, and hardware performance: http://www.ibm.com/developerworks/data/ ... ge205.html
"a 64-bit TM1 installation on a 64-bit operating system requires between 30% and 100% more RAM
due to the larger memory pointers required." & recommends Xeon (i.e. Nehalem) 64bit chips.

Re: Performance Use of Multiple CPU's

Posted: Fri Jun 26, 2009 3:50 am
by Alan Kirk
Alan Kirk wrote:
rollo19 wrote: Win 03 server for TM1Web - understand this needs to fire up an instance of the Excel service for each client so..
I'd love to know where you got that understanding from, 'cos it just ain't so. I have 2 users on Web at the moment, but just the one Excel service. And that's the way I'd expect it to be.
OK, I see where you got it from now; the Brian Barnes link that you provided.

(Sidebar: That's one helluva forum that mombu one... it has the date of a posting, the month of the posting, the time of the posting...

Pity it doesn't have the year.

However considering that he describes himself in the post as "Applix Product Management" and states that "TM1 Web is exciting new (sic) technology", methinks that post was a while ago and some things may have changed.)

It's an interesting article, but when you said "Excel Service" I thought that you were referring to THE Excel service; that is, the one named TM1 Excel Service in the Services list (or TM1ExcelService.exe in the processes list).

I stand by my statement that there is but one of those.

However when a user queries a Websheet for the first time, it does kick off a session of the Excel application. (That is, Excel.exe.) It's not a visible session, and more importantly it only exists long enough to render the websheet. You can see it flash ephemerally in Task Manager as the user makes the request. Also, it only happens the first time you request the websheet; close and re-request the same one and no new Excel.exe session will be launched, presumably since the websheet has already rendered.

Although this may have changed between the original incarnation of Web and the 9.1 version that I'm using, what it does mean is that if you have 10 users on Web you ain't gonna be seeing 10 Excel sessions (of any kind) running on the server, or if you do it'll be for but a moment. (Interestingly, though, I can't get two excel.exe sessions to launch either even when trying to get 2 or more users to simultaneously launch different websheets; I'm wondering whether under 9.1 at least, Web will use an existing session if it's already there rather than wearing the overhead of creating a new one.)
rollo19 wrote:.. found Intel's Nehalem quad core processors allow you to turn off a couple of the cores and over clock the remaining cores. Could this be a beneficial configuration for TM1 server?
You can call me an ol' stick in the mud (since "Curmudgeon" is taken, or used to be...), but I'm none too sure about using overclocking to run a business application. If you're playing HALO (or were back in the days when the CPU was more important than the GPU), hey, go for it. Worst that happens is that if the system fries, you don't get to kill scary aliens until you save up enough for a new box. If a business-critical system fries, on the other hand, the results may be rather more disturbing.

The other thing that you'd need to consider (and your IT department would probably need to be involved in this) is the extent to which overclocking the CPU may void your maintenance contracts.

To me, and I readily concede that others' mileages may vary on this, I think that there's a reason that chips for business machines are clocked as they are (and I grant that part of it is conservatism on the chipmakers' part), but another part of it is the whole world of difference between a business environment and a gaming one. (Especially as business servers can be running 24/7 for literally years on end, something that overclocked gaming machines rarely do, even for Civilization IV addicts.)

Re: Performance Use of Multiple CPU's

Posted: Sat Jun 27, 2009 4:47 am
by rollo19
Brilliant that certainly clarifies the excel story and stacks up when you look at the Applix notes on sizing – that is a reasonably humble TM1Web server for up to 100 users.

And noting your comment that 32 bit is not necessarily any faster than 64, if you consider the length of the address pointers.. without bigger processor caches (64 bit pointers can be some 50% bigger) and or pointer compression, the 64 bit engine may take longer to calculate than the 32 bit.

Over clocking the Nehalem; that came from an HP blade sales guy. It's an option in the BIOS, buy turning off two cores (producing less heat at normal speeds) it can automatically run up the speeds of the other two cores (producing more heat) and regulate to stay within the designed thermal envelope (unlike some gaming rigs..). So yes - supported under warranty for 24/7 use.

So TM1 seems to re-calc on each client write request serially, through a single core. Bombard it with write requests faster than it can calculate (as fast as it is) and the requests get queued (delays).

It seems unnecessary to me to re-calculate on each little update when you have many users writing back (on different slices) at the same time (e.g. bottom-up). I wonder if IBM have designs to tidy up TM1Web so that it analyses client write requests to group them into a single update for TM1 server when free.. pretend parallel processing. :idea: or some other wizardry like automating behind the scenes server replication.. Ah it’s easy to say from the user camp - IBM do reportedly have a huge development team on this so I am expecting allot..

Re: Performance Use of Multiple CPU's

Posted: Sat Jun 27, 2009 8:43 am
by David Usherwood
My take on a few of your points.
64 bit pointers are _always_ twice the size of 32 bit pointers - they have to be. The _server_ size isn't twice that under 32 bit because some of the memory (fortunately) is occupied by data!

TM1 doesn't recalc after a write. What is does is throw away all calculation results in the cube with the new data and all 'dependent' cubes ie those directly or indirectly fed by the cube with the new data. This was written up some time ago by Manny Perez in a document about the 'ReadersBypassWriters' config parameter. This _predates_ the new locking mechanism in 9.1, which has nothing to do with calculation, but relates to input. Pre 9.1, input locked the server. 9.1 and later, it locked the cube (not sure about linked cubes here). This is why you have to go the unlinked microcube route to get the benefit from 9.1/9.4.

Calculation is (as you and others have spotted) single threaded. The only things TM1 multithreads are reading and (if you turn the cfg parameter on) feeder loading on startup.

In my view the best way to use a modern multicore server would be to run multiple TM1 instances, which depending on when you bought your server is not as costly as it used to be. But the server to server communication capabilities are still very limited indeed. I've been pushing this for many years and got nowhere.