Page 1 of 1
Chore crashing tm1
Posted: Fri Nov 21, 2008 3:35 pm
by mbeckw
I have a chore that I have been using for about 6 months with no problem. All of a sudden it causes tm1 to crash when it is on a schedule. I run the chore manually and have no problem but the scheduled chore always crashes the system. Items that might have an impact are:
we are using v9.1.2 as a service.
running 2 databases (migrating to new cubes in a new database and then deleting the old database)
running about 1.6 Gig of memory on a 32bit server. will be about 1.2 Gig whenols database is deleted.
Thanks
Re: Chore crashing tm1
Posted: Fri Nov 21, 2008 6:34 pm
by Alan Kirk
mbeckw wrote:I have a chore that I have been using for about 6 months with no problem. All of a sudden it causes tm1 to crash when it is on a schedule. I run the chore manually and have no problem but the scheduled chore always crashes the system. Items that might have an impact are:
we are using v9.1.2 as a service.
running 2 databases (migrating to new cubes in a new database and then deleting the old database)
running about 1.6 Gig of memory on a 32bit server. will be about 1.2 Gig whenols database is deleted.
Thanks
I suggest checking the log files to see whether another chore is scheduled to / attempting to run at the same time. There was a bug in earlier releases of 9.1 (fixed in SP4) which caused the server to crash if two chores tried to run simultaneously. (Can't recall whether it was in 9.1 SP2.)
Actually it's worth checking the server log anyway to see what (if any) messages are written at the time of the crash. Also, check the Event Viewer on the Windows server (right click on My Computer and select "Manage") to see if any error events were reported.
Finally, is one of the commands that you're using SaveDataAll? That thing's been flaky in scheduled chores for ages. I believe it's supposed to have been fixed in the latest releases, but I'll believe it when I see it. If you're doing a data save, you may want to split that off into a separate chore and see whether that's what's triggering the crash.
Re: Chore crashing tm1
Posted: Fri Nov 21, 2008 7:41 pm
by George Regateiro
Checking the log between the two different runs is a good idea. I am on 9.1 sp2 and had an incident where a chore that was running fine when manually kicked off would lock up the server.
Looking at the logs I was seeing different behavior
1) When run manually all process ran in serial
2) When scheduled all processes ran parallel, the only problem it was the same process that was running 4 times during the chore with different parameters and it locked up the system
Support was unable to help and the eventual solution was to rebuild the chore and now everything is happy.
Not much in the old post, but is shows an example of my log
http://applixforum.olapforums.com/viewP ... eir#161921
Re: Chore crashing tm1
Posted: Sun Nov 23, 2008 5:42 pm
by mattgoff
Locking during competing chores does seem to be fixed for us in 9.1 SP4, BUT it appears (I'm still debugging) that this does not hold true for scheduled replication. I have two different planets with overlapping RESERVE rights-- it looks like if their replications occur at the same time it's random which planet's data wins, even if only one planet has changes.
Meaning/Scenario: Planet A has changes + Planet B has none + coincident replication with a shared Star = original values could be written back to Planet A erasing updates. Since it's a race condition, this can be inconsistent too.
Not sure how many people are using replication, but it's something to be cautious.
Matt
Re: Chore crashing tm1
Posted: Tue Nov 25, 2008 1:00 am
by Gregor Koch
The "locking during competing chores" is definetely not fixed in 9.1 SP4.
We are testing 9.1 SP4 (to upgrade from 9.0 SP3 U8) and have so many problems with it, one of them being competing chores which hang the server, that the upgrade has been postponed.
Cheers
Re: Chore crashing tm1
Posted: Tue Nov 25, 2008 3:41 am
by Alan Kirk
Gregor Koch wrote:The "locking during competing chores" is definetely not fixed in 9.1 SP4.
We are testing 9.1 SP4 (to upgrade from 9.0 SP3 U8) and have so many problems with it, one of them being competing chores which hang the server, that the upgrade has been postponed.
I'm not sure whether we've started to talk about two different things here. In 9.1 SP3, if you ran two chores at the same time it wouldn't lock the server or hang it; it would tank the thing, crash it, terminate it with extreme prejudice, the server wouldn't be pining, it would be passed on. The server session would be no more. It ceased to be. It expired and went to meet its maker. This is a late server session. It's a stiff. Bereft of life, it rests in peace. The server session wouldn't voom if you put four thousand volts through it, and believe me after that happened a few times that was awfully tempting.
This crash would be accompanied by a Windows memory read / write error dialog on the server box which would prevent the service from restarting until you cleared it.
That error does indeed appear to have been fixed in SP4, or at the very least we haven't been able to reproduce it using the same chores that would crash SP3.
This of course doesn't necessarily mean that there aren't necessarily OTHER problems with it, one of which may be what's being referred to here; though we haven't encountered them as yet.
Re: Chore crashing tm1
Posted: Wed Nov 26, 2008 1:14 am
by Gregor Koch
Alan,
I think I did start talking about something else. Sorry about that.
Wasn't talking about a (Monty Python-) server crash.
In the end the result for us is pretty similar though, because after the two competing chores hang each other all that is left to do is to restart the server.
The solution, and I think this was mentioned elsewhere before, is to have the involved processes write a flag to indicate they are running and have all chores check that flag.
Cheers
PS:
"Tis but a scratch"
"A scratch?! Your arm's off!"
"No, it isn't.'â€
Re: Chore crashing tm1
Posted: Wed Nov 26, 2008 4:31 pm
by Steve Vincent
<refrains from Monty Python post in aid of remaining on topic>
Had that happen in 9.0.3 on a few occasions too so it's been around a while. One chore that copies some performance monitor stuff, and any number of others where it kicked in during its running. Sometimes it would complete (but much, much slower than usual) other times i decided to give up and killed the service manually. Always seemed to happen at the worst possible time too...
I have played with 9.4 a bit (but had to remove it today due to other work earmarked for the server) but didn't get to testing the chore issue. I'd had enough problems with the new audit logs before i started to look in to it in any depth.
now where is that manual on how not to be seen...