Page 1 of 1

Loss of Whole Data Directory

Posted: Thu Aug 13, 2015 1:43 pm
by jim wood
We came in this morning to our UAT service down and the whole data directory empty. Luckily our logs are in a different directory and are still there. This is what we found:

Code: Select all

2836   []   INFO   2015-08-12 23:00:05.702   TM1.Chore   Chore "Execute_MCPR_Stored_Processes _Every_1_Hr" finished executing
6396   []   INFO   2015-08-12 23:09:55.796   TM1.Server   Closing...
6396   []   INFO   2015-08-12 23:09:55.796   TM1.Server   Saving...
6396   []   INFO   2015-08-12 23:09:55.796   TM1.Server   The server is coming down...
6396   []   INFO   2015-08-12 23:09:55.796   TM1.Server   TM1ServerImpl::Destroy: disconnect clients
6396   []   INFO   2015-08-12 23:09:55.796   TM1.Server   TM1ServerImpl::Destroy: Deactive all chores
6396   []   INFO   2015-08-12 23:09:55.796   TM1.Chore   Deactivating chore: Execute_MCPR_Stored_Processes _Every_1_Hr
6396   []   INFO   2015-08-12 23:09:57.621   TM1.Chore   Deactivating chore: Meta Build Project Hierarchy - CMWP Subsets
6396   []   INFO   2015-08-12 23:09:59.992   TM1.Chore   Deactivating chore: Save_Data_All_Every_Hour
6396   []   INFO   2015-08-12 23:10:02.566   TM1.Server   TM1ServerImpl::Destroy: Save server data
6396   []   INFO   2015-08-12 23:10:02.582   TM1.Server   TM1ServerImpl::Destroy: destroy chores
6396   []   INFO   2015-08-12 23:10:02.582   TM1.Server   TM1ServerImpl::Destroy: destroy processes
6396   []   INFO   2015-08-12 23:10:02.582   TM1.Server   TM1ServerImpl::Destroy: destroy blobs
6396   []   INFO   2015-08-12 23:10:02.582   TM1.Server   TM1ServerImpl::Destroy: destroy sets
6396   []   INFO   2015-08-12 23:10:02.582   TM1.Server   TM1ServerImpl::Destroy: destroy groups
6396   []   INFO   2015-08-12 23:10:02.582   TM1.Server   TM1ServerImpl::Destroy: destroy clients
6396   []   INFO   2015-08-12 23:10:02.597   TM1.Server   TM1ServerImpl::Destroy: destroy cubes
6396   []   INFO   2015-08-12 23:10:02.956   TM1.Server   TM1ServerImpl::Destroy: destroy dimensions
6396   []   INFO   2015-08-12 23:10:03.939   TM1.Server   TM1ServerImpl::Destroy: destroy connections
6396   []   INFO   2015-08-12 23:10:03.939   TM1.Server   TM1ServerImpl::Destroy: Destroy unregistered objects
6396   []   INFO   2015-08-12 23:10:03.939   TM1.Server   TM1ServerImpl::Destroy: Commit changes
6280   []   INFO   2015-08-12 23:10:04.095   TM1.Server   Terminating Admin Server poller thread.
6396   []   ERROR   2015-08-12 23:10:04.111   TM1.Server   net_SetReadBufferAt: Attempted to set position = 6 past received network data size = 0.
6396   []   INFO   2015-08-12 23:10:04.111   TM1.Server   Server shutdown
I have seen one post on here with something similar but no real help. Have any of you guys seen anything like it? We can't seen any reason why it would happen,

Jim.

Re: Loss of Whole Data Directory

Posted: Thu Aug 13, 2015 2:20 pm
by TrevorGoss
If this makes any difference,

in our logs the line "Terminating Admin Server poller thread." comes before the destruction of objects, cubes, processes etc...

Code: Select all

7336   []   INFO   2015-07-08 02:15:02.700   TM1.Server   Closing...
7336   []   INFO   2015-07-08 02:15:02.700   TM1.Server   Saving...
7336   []   INFO   2015-07-08 02:15:02.700   TM1.Server   The server is coming down...
7336   []   INFO   2015-07-08 02:15:02.700   TM1.Server   TM1ServerImpl::Destroy: disconnect clients
7336   []   INFO   2015-07-08 02:15:02.700   TM1.Server   TM1ServerImpl::Destroy: Deactive all chores
7336   []   INFO   2015-07-08 02:15:02.700   TM1.Chore   Deactivating chore: _HotBackupZip
7336   []   INFO   2015-07-08 02:15:06.834   TM1.Chore   Deactivating chore: Chore_Reload_CoA_and_Prj_Mappings
7336   []   INFO   2015-07-08 02:15:09.408   TM1.Chore   Deactivating chore: FindServiceDetails
7336   []   INFO   2015-07-08 02:15:13.589   TM1.Chore   Deactivating chore: KickContractAnalysis_Reporting
7336   []   INFO   2015-07-08 02:15:18.050   TM1.Chore   Deactivating chore: ManualTrigger_Reset_Calendar_To_Default
7336   []   INFO   2015-07-08 02:15:22.356   TM1.Chore   Deactivating chore: ManualTriggerCalenderCheck
7336   []   INFO   2015-07-08 02:15:23.729   TM1.Chore   Deactivating chore: NightlyRestartService
7336   []   INFO   2015-07-08 02:15:28.690   TM1.Chore   Deactivating chore: PBF_CurrentForecast_Move
7336   []   INFO   2015-07-08 02:15:33.619   TM1.Chore   Deactivating chore: Reprocess Costbase Feeders
6336   []   INFO   2015-07-08 02:15:33.978   TM1.Server   Terminating Admin Server poller thread.
7336   []   INFO   2015-07-08 02:15:33.978   TM1.Server   TM1ServerImpl::Destroy: Save server data
7336   []   INFO   2015-07-08 02:15:34.025   TM1.Server   TM1ServerImpl::Destroy: destroy chores
7336   []   INFO   2015-07-08 02:15:34.025   TM1.Server   TM1ServerImpl::Destroy: destroy processes
7336   []   INFO   2015-07-08 02:15:34.040   TM1.Server   TM1ServerImpl::Destroy: destroy blobs
7336   []   INFO   2015-07-08 02:15:34.040   TM1.Server   TM1ServerImpl::Destroy: destroy sets
7336   []   INFO   2015-07-08 02:15:34.040   TM1.Server   TM1ServerImpl::Destroy: destroy groups
7336   []   INFO   2015-07-08 02:15:34.040   TM1.Server   TM1ServerImpl::Destroy: destroy clients
7336   []   INFO   2015-07-08 02:15:34.040   TM1.Server   TM1ServerImpl::Destroy: destroy cubes
7336   []   INFO   2015-07-08 02:15:34.165   TM1.Server   TM1ServerImpl::Destroy: destroy dimensions
7336   []   INFO   2015-07-08 02:15:34.321   TM1.Server   TM1ServerImpl::Destroy: destroy connections
7336   []   INFO   2015-07-08 02:15:34.321   TM1.Server   TM1ServerImpl::Destroy: Destroy unregistered objects
7336   []   INFO   2015-07-08 02:15:34.321   TM1.Server   TM1ServerImpl::Destroy: Commit changes
7336   []   INFO   2015-07-08 02:15:34.992   TM1.Server   Server shutdown
Maybe this is somthing significant?

Re: Loss of Whole Data Directory

Posted: Thu Aug 13, 2015 2:31 pm
by BrianL
I've never heard of the data directory becoming empty. Was the server shutdown expected? If not and you're running TM1 as a service you could check the MS event viewer for Windows logs on why the service shutdown.

Re: Loss of Whole Data Directory

Posted: Thu Aug 13, 2015 2:34 pm
by declanr
Blame all your colleagues for pressing delete and see which one cracks first.

Re: Loss of Whole Data Directory

Posted: Thu Aug 13, 2015 2:41 pm
by jim wood
Brian,

In the event viewer we have no critical errors. We have the following error at 7pm:

Code: Select all

 tm1sdx64 error: 0 
   E16) Cannot connect to ODBC data source "TELEDB14_MCPR" IM002[Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified. 
Then at 7.10pm we get this:

Code: Select all

tm1sdx64 error: 2 
   Data directory not specified. Aborting server start up. 
Nothing in between that mentions the service coming down,

Jim.

Re: Loss of Whole Data Directory

Posted: Thu Aug 13, 2015 2:57 pm
by BrianL
How about the "System" logs in the "Windows Logs" folder? Anything from "Service Control Manager"? I'd expect to see (at least) a message indicating the service entered the stopped state.

Re: Loss of Whole Data Directory

Posted: Thu Aug 13, 2015 3:20 pm
by gtonkin
Hi Jim, is the data directory on the same device as the logs, not network attached / a symbolic link / junction etc?
I have seen something similar where we used a SAN and the SAN connection disappeared.
The only other thing that comes to mind is a script Task Scheduler or similar that may have truncated.

Re: Loss of Whole Data Directory

Posted: Thu Aug 13, 2015 3:26 pm
by jim wood
BrianL wrote:How about the "System" logs in the "Windows Logs" folder? Anything from "Service Control Manager"? I'd expect to see (at least) a message indicating the service entered the stopped state.
Brian,

I couldn't find anything in the log directory.

Re: Loss of Whole Data Directory

Posted: Thu Aug 13, 2015 3:28 pm
by jim wood
gtonkin wrote:Hi Jim, is the data directory on the same device as the logs, not network attached / a symbolic link / junction etc?
I have seen something similar where we used a SAN and the SAN connection disappeared.
The only other thing that comes to mind is a script Task Scheduler or similar that may have truncated.
We are indeed on SAN storage. I've asked our server guy to see if there were any connection issues last night,

Jim.

Re: Loss of Whole Data Directory

Posted: Thu Aug 13, 2015 7:44 pm
by Alan Kirk
jim wood wrote:We came in this morning to our UAT service down and the whole data directory empty.
...
I have seen one post on here with something similar but no real help. Have any of you guys seen anything like it? We can't seen any reason why it would happen,
The only time I've seen that happen is here, but you, unlike IBM, probably know better than to store data files in the Program Files path. That being the case the SAN issue suggested by gtonkin would seem to be the more productive line of enquiry.

Re: Loss of Whole Data Directory

Posted: Fri Aug 14, 2015 12:02 pm
by jim wood
Alan Kirk wrote:The only time I've seen that happen is here, but you, unlike IBM, probably know better than to store data files in the Program Files path.
This was a setup I inherited. Thankfully this was one thing the previous owners got right. I mean one thing.
Alan Kirk wrote:That being the case the SAN issue suggested by gtonkin would seem to be the more productive line of enquiry.
We've asked our server team to check this out. They haven't spotted anything so far but it could have been caused by only a brief connection issue. This is not a production server so less attention is paid to it. It does (btw) sound like the most likely cause. I don't know how much backside covering is happening within the server team when they say they haven't found anything. Only time will tell I guess.

Thanks for all your input guys. When I get anything back from IBM I'll post it here for future reference,

Jim.

Re: Loss of Whole Data Directory

Posted: Mon Aug 17, 2015 10:01 pm
by Steve Rowe
Hi Jim,
Just to say in my last role we were running prod / uat / dev for 30 plus instances on SANs and not once did I hear of the DD going totally AWOL in the few years I was there.

If the SAN had dropped off for some time I can't see how the whole DD would get deleted, at worst the instance would be unable to write back to the DD and then fall over or you would end up with a bunch of dot $ files or similar? You could test the behaviour of TM1 when the DD goes missing by just running up an instance and deleting / moving the DD and see what messages TM1 produces.

IMO the most likely explanation is that someone deleted the DD in error and is keeping quiet.....In the absence of any evidence that there was a technical issue I'd be shrugging my shoulders and moving on and maybe look at the security on the infrastructure.

Cheers,

Re: Loss of Whole Data Directory

Posted: Tue Aug 18, 2015 12:17 pm
by jim wood
Steve Rowe wrote:IMO the most likely explanation is that someone deleted the DD in error and is keeping quiet.....In the absence of any evidence that there was a technical issue I'd be shrugging my shoulders and moving on and maybe look at the security on the infrastructure.
Normally I'd think the same but only the files stated in the log are missing. The view folders etc are still there. The only file deleted that wasn't in the log was the CFG file.

Re: Loss of Whole Data Directory

Posted: Tue Jun 09, 2020 4:08 pm
by gtonkin
Jim, did you ever get a resolution or a better understanding of why you got the message?

We also recently received a similar message to yours:

Code: Select all

19880   [284a4]   ERROR   2020-06-09 04:48:54.599   TM1.Server.Network   net_SetReadBufferAt: Attempted to set position past received network data size.
This message seems to repeat about 20 times in the same instant i.e. at 2020-06-09 04:48:54.599.
We have filled up our 20 TM1 server logs of 100MB each and it is now logging and rolling the logs.

Not sure about any data loss at this stage but server is still running.

Have logged a change request to have the physical server rebooted but just hoping to hear of anything that could be checked before the reboot.
Server is still 10.2.2 FP4 (migration to PA imminent)

If anyone else has any insight, would appreciate.