Page 1 of 1

No Server Abort on a Disk Failure?

Posted: Wed Oct 22, 2008 2:58 am
by Alan Kirk
This is an infrequent issue, but it bit me this morning.

We have a number of "one off" (unscheduled) chores which populate a scenario for weekly reporting. These chores are not logged because there's far too much data. The intention is that they create a "snapshot" of the numbers at the time it was run, which the business units can then adjust without having to worry about having a moving target.

We don't do a data save immediately after the chore is run because with cubes stretching into hundreds of megs it just takes far too long.

The next data save is the next morning, before people get in. (Remember that we're on 8.2.12, which has server hang issues with chore driven data saves. It therefore has to be done manually.)

This morning I got an "Unable to write to disk" error. I still haven't isolated the cause of it, but have asked the IT department to look at it. Whatever it was, it was transient since I've done data saves since then.

However TM1 doesn't see it that way. As soon as it hits a disk error, it aborts, claiming that it hasn't saved any data. (This isn't necessarily true. If the error is encountered after some cubes have been saved to disk, then when the server restarts those cubes, but not the others, will have all changed values - whether logged or unlogged. It will only be true if the error is hit on the first cube being written.)

This meant that although we repopulated our reporting scenario this morning... the numbers WEREN'T the same as the snapshot from the previous day.

I can't think of a single good reason for TM1 aborting on such an error. Yes, a disk error MAY be symptomatic of a serious hardware failure, but equally it may not. The data is still in memory. If the server aborts, you lose unlogged changes, without any way back. If it doesn't, and it gives you an opportunity to cancel the save and/or to try again later, at least you don't lose the unlogged changes.

The only reason I can think of for aborting is that you may end up in a situation where some of the logged changes have been saved to disk in the .cub files and some haven't, but if the error DOES occur part way through a save then that ship has already sailed. The server restart should reload all of the logged values anyway, just as it currently does.

The only other reason is of course that Services are supposed to be invisible, and shouldn't be popping up dialogs.

However I think that this is a serious enough issue that the user should at least be given the OPTION of not losing all of their unlogged entries, especially as there will probably be MANY unlogged interfaces in the user base.

Thoughts?

Re: No Server Abort on a Disk Failure?

Posted: Wed Oct 22, 2008 8:13 am
by Steve Vincent
Tend to agree with you. I have seen it once on one of our servers and likewise couldn't see any legitimate reason for the failure. We too lost the changes due to the way it was handling the error - not a great customer experience when we were in the middle of month end. It might also link in with an issue i found yesterday when a TI was writing an ascii file and i cancelled it, but it leaves the file locked until the service is restarted. It might be linked, it might not, but i'll post a note in support to try and find out anyway.