Page 1 of 1

Unknown crashes in 9.1.4

Posted: Tue Oct 26, 2010 1:35 pm
by mweldon
Apologies for the lack of detail I am able to provide on this issue, but that is precisely the problem!

We are running 9.1SP4 (along with EV 9.4). I keep seeing this problem occur in our production environment, where a user runs a process from an action button, within 2 or 3 mins of the process successfully finishing the server will crash for no obvious reason. It does not seem to be problem with specific processes as it happens with different processes by different users. Some of the processes trigger other processes. I have been keeping details of each time it crashes (tm1 logs etc) but can't see anything obvious other than a process recently finishing.

Anyone have any ideas/thoughts on what it could be?

Many thanks

-Marc

Re: Unknown crashes in 9.1.4

Posted: Wed Nov 03, 2010 9:33 pm
by rkaif
It could be because of insufficient Hardware resources.

How big are your TM1 models? How much extra RAM you have got? How many simultaneous users you have got?

Re: Unknown crashes in 9.1.4

Posted: Thu Nov 04, 2010 9:16 am
by mweldon
Memory usage runs to about 12.6gb, on our 64 bit server which has 16gb of RAM. We typically have between 10 and 20 simultaneous users, and when one of them runs a process it usually holds up other people's views. Do you think some extra RAM would help?

Re: Unknown crashes in 9.1.4

Posted: Thu Nov 04, 2010 1:03 pm
by jstrygner
From what you say you still have about 3GB on your 64 left, so more RAM will not help (anything that must load into it, loads).

For your simultaneous users more CPU cores could help, but it depends on what process does (into which cubes it writes) and what users are trying to read while process is executing (see this thread: http://forums.olapforums.com/viewtopic. ... 511#p15778).

HTH

Re: Unknown crashes in 9.1.4

Posted: Thu Nov 04, 2010 1:54 pm
by tomok
jstrygner wrote:From what you say you still have about 3GB on your 64 left, so more RAM will not help (anything that must load into it, loads).
We don't really know that. If that 3GB is what is left BEFORE the process starts it could be possibly be using this last 3GB and then crashing. If you are sitting at approx 80% memory utilization (12.6/16) then it is time to add more RAM to your box. 20% is not enough cushion, IMO. If it were me I would strive for 30 to 50% cushion to be safe. RAM is cheap (relatively) compared to the cost of lost productivity.

Re: Unknown crashes in 9.1.4

Posted: Thu Nov 04, 2010 4:57 pm
by rkaif
mweldon wrote: Do you think some extra RAM would help?
Yes I think upgrading the RAM would help because the process you are running could be consuming the remaining RAM (you never know).

Also when you have many simultaneous users then you should have a multi-core processor (if you dot have that already).

Re: Unknown crashes in 9.1.4

Posted: Thu Nov 04, 2010 5:06 pm
by mweldon
As the day has gone on I have noticed the memory usage creep up to nearer 14/15 gb, so I have kicked off the procedure to get another 16gb added in :D

many thanks for all of your responses, I have been very impressed with this forum since I've started using it.

Cheers

-Marc

Re: Unknown crashes in 9.1.4

Posted: Thu Nov 11, 2010 1:22 pm
by mweldon
Just a quick update, I ran the same model on our qa server which has less memory, and as expected it did crash shortly after fully loading up. However this time the tm1 logs contained the following:

3916 ERROR 2010-11-10 16:30:52,077 TM1.Server.Memory CommonAlloc - alloc (size = 131072) failed: The paging file is too small for this operation to complete.
3916 WARN 2010-11-10 16:30:52,109 TM1.Server.Memory CommonAlloc() outOfMemory Exception <<< MEMORY_ALMOST_FATAL_LEVEL >>> - threadID "3916" - apifunc# "121"
3916 WARN 2010-11-10 16:30:52,109 TM1.Server.Memory al_Alloc() outOfMemory Exception <<< MEMORY_FATAL_LEVEL >>> - threadID "3916" - apifunc# "121"
3916 ERROR 2010-11-10 16:30:52,109 TM1.Server TM1 Server Abort: System Out Of Memory.


This is pretty much as I expected, but I am slightly unsure why I did not receive these same messages in my live environment. Weird!