Page 1 of 1
Server crash due to subset corruption
Posted: Wed Sep 10, 2008 7:56 am
by DevGreg
Dear all,
We're experiencing a very annoying issue with our TM1 server, so I'd like to know if you've ever encountered the same problem...
Our server is v9.0 SP3, and is running under Windows Server 2003. It is rebooted every night, as recommended by the former Applix team.
However, for no particular reason, some time during the working day, a subset starts to block everything.
I mean that, if I click on this subset, or if I try to reach it via a TI process, or if I launch a view that uses it, my action blocks every other action for all users.
We use the TM1 Top Custom application to terminate the action, but we do not have another solution than to reboot our server to set everything back to normal again.
This problem is rising more and more often as users are more numerous, and is becoming critical for us.
Have you ever experienced it?
What do you think is the cause of the problem? (corrupted RAM maybe?)
Is there any way to investigate these kind of problems in TM1? (I'm not the one operating the server)
Will an upgrade of TM1 solve it? Going to Unix? Change material configuration? Reboot more often?
Thanks in advance for your answers!
Regards,
Greg
Re: Server crash due to subset corruption
Posted: Wed Sep 10, 2008 9:00 am
by jim wood
Hi Greg,
I have not experienced this problem before the way I would get round it is:
1) Bring the server down.
2) Delete the subset off the hard drive.
3) Restart server.
4) rebuild the subset as soon as the server is back up.
This should solve any file corruption problems. If if it still occurs then you need to look at what is in the subset. Is there a particular measure that has a complicated or large calculation? If there is, can the calculation be simplified? Also check any feeder to make sure you have not over fed a measure. To check this you may want to complete the above 4 steps but rebuild the subset element at a time after the restart to identify the said criminal.
I hope this helps,
Jim.
Re: Server crash due to subset corruption
Posted: Wed Sep 10, 2008 9:10 am
by DevGreg
Hello Jim,
Thanks for the quick answer.
The problem is that it's not a particular subset that create this problem, but random subsets!
They can be either dynamic or static, until now they weren't measure dimensions, but that's the whole point of the problem: there's no way for us to identify how their handle gets "corrupted" at some point...
That's why I was suspecting hardware issues: maybe there are some addresses in our RAM that don't work properly, and when TM1 uses these defected bits of RAM, it blocks the system...
Do you think is explanation is plausible?
Re: Server crash due to subset corruption
Posted: Wed Sep 10, 2008 9:57 am
by jim wood
Sounds like a viable answer. Do you have a backup server that you can test this theory on? If not you could always try smaller chunks say on a laptop. The later is not ideal as it is not a true reflection of your live environment but it may lead you to a more specific cause. (If there is one.)
Jim.
Re: Server crash due to subset corruption
Posted: Wed Sep 10, 2008 9:59 am
by jim wood
Just thoughtof something else. Have you tried turning the logging on to maximum to make sure you track every calculation. This may also give you something that you can pass on to Cognos Support. (To find out how to change your logging setting check out the operations that you'll fidn in the installation directory on your server.)
Jim.
Re: Server crash due to subset corruption
Posted: Wed Sep 10, 2008 12:52 pm
by DevGreg
Thanks again Jim for your feedback.
We only have one server at our disposal, but I'll be pushing my colleagues and direction so that we get another machine to perform these tests...
Logging is already implemented for all our cubes, we didn't analyse it, but I doubt that we'll be able to find anything in it...
Indeed, these crashes also occurred on our development server (based on the same machine), and it was ignited by various events, which were not data entry: save of a rule, changing access rights, launching a TI...
Re: Server crash due to subset corruption
Posted: Wed Sep 10, 2008 2:30 pm
by jim wood
Hello again. When I say logging I mean server logging. If you set this lowest level every calculation and action is traced. It as near to a server dump as you can get,
Jim.
PS. As I said earlier, have a quick look through operations manual. (Spelt badly and mis-typed of course) It goes through all the logging options.
Re: Server crash due to subset corruption
Posted: Wed Sep 10, 2008 9:24 pm
by paulsimon
Greg,
I have had cases of rogue MDX subsets in that version causing crashes, but never random issues. I would do a search on the data directory for *.??$. (Star dot question question dollar) to see if it finds an files. These can be left over from failed attempts by TM1 to save a file.
Another possibility is that there is a virus checker locking the files in the directory.
Regards
Paul Simon
Re: Server crash due to subset corruption
Posted: Fri Sep 12, 2008 9:55 am
by Steve Vincent
Another possibility is that there is a virus checker locking the files in the directory.
if its random subsets, then the virus checking is where i'd be looking first. We had all sorts of problems with ours, you need to ensure the entire data directory is excluded from any virus checking on the server.
Re: Server crash due to subset corruption
Posted: Tue Nov 04, 2008 2:49 pm
by laurent
I experienced same TM1 server hang using MDX dynamic subset (TM1 90SP3U5).
What do you suggest exactly?
-remove *.??$ files
-do not scan the database directory ?
Thank you for your help.Laurent
Re: Server crash due to subset corruption
Posted: Sun Mar 29, 2020 8:12 pm
by mce
similar issue after so many years.
PH17265: TM1 SERVER CRASH DUE TO CORRUPT SUBSET
at
https://www.ibm.com/support/pages/node/3799731