Page 1 of 1
Intermittent chore failure
Posted: Thu Oct 04, 2012 6:07 am
by Steve Rowe
Hi,
We are experiencing an intermittent chore failures on one of our instances.
We have three chores
Copy files - running at 40s intervals
Load files - running at 1 min intervals
Save data - running daily.
Although rather extreme frequencies this seems to run quite happily until all of a sudden it doesn't.
We are running 9.5.1 not sure of the SP (I think its 1 not 100%)
We don't get any errors or anything and a restart clears the issue. It does seem to happen after the instance has been up for "a while" though.
Anyone have any clues as to why this maybe happening? Anyone heard of some kind of upper limit on the scheduler that means that it can only run so many chores in a session?
Obviously we could go for an overnight restart but not sure that it is practical for us as we are supposed to have 24 hrs up time...
Any thoughts appreciated..
Cheers,
Re: Intermittent chore failure
Posted: Thu Oct 04, 2012 6:13 am
by Alan Kirk
I don't doubt that this is something that you've tried, but if you turn off the schedule on all three and then restart them, does that clear the problem? Or is it only a server reboot that will do it?
Re: Intermittent chore failure
Posted: Thu Oct 04, 2012 6:40 am
by Steve Rowe
Thanks Alan,
Yeah we don't seem to be clear the issue in SE no matter what we try.
Another side effect is that if we run the chores manually they appear to be running but never finish...
Chers.
Re: Intermittent chore failure
Posted: Thu Oct 04, 2012 12:16 pm
by qml
Steve Rowe wrote:We are running 9.5.1 not sure of the SP (I think its 1 not 100%)
I thought we/you/they run 9.5.2 on all servers...
Steve Rowe wrote:Anyone heard of some kind of upper limit on the scheduler that means that it can only run so many chores in a session?
You will find that one of the TM1 models in your vicinity runs a chore every 30s, another chore every 1m, another chore every 2m and a bunch of other less frequent chores. This works rather nicely in production and has been for the last 2 years or so, so it can be done in principle.
Anything about these chores that is atypical? Could you try and test this on different types of processes to see what the actual issue is and if it's to do with the scheduler or the processes themselves?
Re: Intermittent chore failure
Posted: Thu Oct 04, 2012 1:10 pm
by Steve Rowe
qml wrote
Steve Rowe wrote:
We are running 9.5.1 not sure of the SP (I think its 1 not 100%)
I thought we/you/they run 9.5.2 on all servers...
Probably I'd been up since two supporting UAT in AsiaPac, Joy...
Kind of good to know that this kind of high frequency chore can run without issue so there are no fundamental infrastructure limits.
It's not my code so I don't know the ins and outs of it....The data load is pretty fiddly I think checking for file locks and so on but fundamentally its still grab a file and pump it into TM1.
It feels like a problem with the chore scheduler rather than the TI itself. We don't get any errors and there are no hanging processes in top, but who knows???
Cheers,
Re: Intermittent chore failure
Posted: Thu Oct 04, 2012 2:45 pm
by garry cook
Possibly the concurrent chores messing up the TI indexing between each other? Seen that before where Chore 1 kicks off processes a,b,c and Chore 2 kicks off d,e,f and at a point when they run, they keep restarting each other in terms of the processes running so goes a,b,d,a,b,a,b.... forever. Happened on 9.5.1 and earlier but seems to have gone from 9.5.2
Long shot, completely inconsistent when it happens (like the squashed dim editor that crashes excel on slice you need to resize then cancel that I've never figured out what triggers it to appear) but does pop up occasionally and given your scheduling does seem possible to get an overlap.
Re: Intermittent chore failure
Posted: Fri Oct 05, 2012 9:00 am
by Steve Rowe
Thanks Garry,
Not sure if that is possible since we don't see any activity in TM1Top. It's possible Top is misleading us too I suppose.
Cheers,