Page 1 of 1
Loss of Whole Data Directory
Posted: Thu Aug 13, 2015 1:43 pm
by jim wood
We came in this morning to our UAT service down and the whole data directory empty. Luckily our logs are in a different directory and are still there. This is what we found:
Code: Select all
2836 [] INFO 2015-08-12 23:00:05.702 TM1.Chore Chore "Execute_MCPR_Stored_Processes _Every_1_Hr" finished executing
6396 [] INFO 2015-08-12 23:09:55.796 TM1.Server Closing...
6396 [] INFO 2015-08-12 23:09:55.796 TM1.Server Saving...
6396 [] INFO 2015-08-12 23:09:55.796 TM1.Server The server is coming down...
6396 [] INFO 2015-08-12 23:09:55.796 TM1.Server TM1ServerImpl::Destroy: disconnect clients
6396 [] INFO 2015-08-12 23:09:55.796 TM1.Server TM1ServerImpl::Destroy: Deactive all chores
6396 [] INFO 2015-08-12 23:09:55.796 TM1.Chore Deactivating chore: Execute_MCPR_Stored_Processes _Every_1_Hr
6396 [] INFO 2015-08-12 23:09:57.621 TM1.Chore Deactivating chore: Meta Build Project Hierarchy - CMWP Subsets
6396 [] INFO 2015-08-12 23:09:59.992 TM1.Chore Deactivating chore: Save_Data_All_Every_Hour
6396 [] INFO 2015-08-12 23:10:02.566 TM1.Server TM1ServerImpl::Destroy: Save server data
6396 [] INFO 2015-08-12 23:10:02.582 TM1.Server TM1ServerImpl::Destroy: destroy chores
6396 [] INFO 2015-08-12 23:10:02.582 TM1.Server TM1ServerImpl::Destroy: destroy processes
6396 [] INFO 2015-08-12 23:10:02.582 TM1.Server TM1ServerImpl::Destroy: destroy blobs
6396 [] INFO 2015-08-12 23:10:02.582 TM1.Server TM1ServerImpl::Destroy: destroy sets
6396 [] INFO 2015-08-12 23:10:02.582 TM1.Server TM1ServerImpl::Destroy: destroy groups
6396 [] INFO 2015-08-12 23:10:02.582 TM1.Server TM1ServerImpl::Destroy: destroy clients
6396 [] INFO 2015-08-12 23:10:02.597 TM1.Server TM1ServerImpl::Destroy: destroy cubes
6396 [] INFO 2015-08-12 23:10:02.956 TM1.Server TM1ServerImpl::Destroy: destroy dimensions
6396 [] INFO 2015-08-12 23:10:03.939 TM1.Server TM1ServerImpl::Destroy: destroy connections
6396 [] INFO 2015-08-12 23:10:03.939 TM1.Server TM1ServerImpl::Destroy: Destroy unregistered objects
6396 [] INFO 2015-08-12 23:10:03.939 TM1.Server TM1ServerImpl::Destroy: Commit changes
6280 [] INFO 2015-08-12 23:10:04.095 TM1.Server Terminating Admin Server poller thread.
6396 [] ERROR 2015-08-12 23:10:04.111 TM1.Server net_SetReadBufferAt: Attempted to set position = 6 past received network data size = 0.
6396 [] INFO 2015-08-12 23:10:04.111 TM1.Server Server shutdown
I have seen one post on here with something similar but no real help. Have any of you guys seen anything like it? We can't seen any reason why it would happen,
Jim.
Re: Loss of Whole Data Directory
Posted: Thu Aug 13, 2015 2:20 pm
by TrevorGoss
If this makes any difference,
in our logs the line "Terminating Admin Server poller thread." comes before the destruction of objects, cubes, processes etc...
Code: Select all
7336 [] INFO 2015-07-08 02:15:02.700 TM1.Server Closing...
7336 [] INFO 2015-07-08 02:15:02.700 TM1.Server Saving...
7336 [] INFO 2015-07-08 02:15:02.700 TM1.Server The server is coming down...
7336 [] INFO 2015-07-08 02:15:02.700 TM1.Server TM1ServerImpl::Destroy: disconnect clients
7336 [] INFO 2015-07-08 02:15:02.700 TM1.Server TM1ServerImpl::Destroy: Deactive all chores
7336 [] INFO 2015-07-08 02:15:02.700 TM1.Chore Deactivating chore: _HotBackupZip
7336 [] INFO 2015-07-08 02:15:06.834 TM1.Chore Deactivating chore: Chore_Reload_CoA_and_Prj_Mappings
7336 [] INFO 2015-07-08 02:15:09.408 TM1.Chore Deactivating chore: FindServiceDetails
7336 [] INFO 2015-07-08 02:15:13.589 TM1.Chore Deactivating chore: KickContractAnalysis_Reporting
7336 [] INFO 2015-07-08 02:15:18.050 TM1.Chore Deactivating chore: ManualTrigger_Reset_Calendar_To_Default
7336 [] INFO 2015-07-08 02:15:22.356 TM1.Chore Deactivating chore: ManualTriggerCalenderCheck
7336 [] INFO 2015-07-08 02:15:23.729 TM1.Chore Deactivating chore: NightlyRestartService
7336 [] INFO 2015-07-08 02:15:28.690 TM1.Chore Deactivating chore: PBF_CurrentForecast_Move
7336 [] INFO 2015-07-08 02:15:33.619 TM1.Chore Deactivating chore: Reprocess Costbase Feeders
6336 [] INFO 2015-07-08 02:15:33.978 TM1.Server Terminating Admin Server poller thread.
7336 [] INFO 2015-07-08 02:15:33.978 TM1.Server TM1ServerImpl::Destroy: Save server data
7336 [] INFO 2015-07-08 02:15:34.025 TM1.Server TM1ServerImpl::Destroy: destroy chores
7336 [] INFO 2015-07-08 02:15:34.025 TM1.Server TM1ServerImpl::Destroy: destroy processes
7336 [] INFO 2015-07-08 02:15:34.040 TM1.Server TM1ServerImpl::Destroy: destroy blobs
7336 [] INFO 2015-07-08 02:15:34.040 TM1.Server TM1ServerImpl::Destroy: destroy sets
7336 [] INFO 2015-07-08 02:15:34.040 TM1.Server TM1ServerImpl::Destroy: destroy groups
7336 [] INFO 2015-07-08 02:15:34.040 TM1.Server TM1ServerImpl::Destroy: destroy clients
7336 [] INFO 2015-07-08 02:15:34.040 TM1.Server TM1ServerImpl::Destroy: destroy cubes
7336 [] INFO 2015-07-08 02:15:34.165 TM1.Server TM1ServerImpl::Destroy: destroy dimensions
7336 [] INFO 2015-07-08 02:15:34.321 TM1.Server TM1ServerImpl::Destroy: destroy connections
7336 [] INFO 2015-07-08 02:15:34.321 TM1.Server TM1ServerImpl::Destroy: Destroy unregistered objects
7336 [] INFO 2015-07-08 02:15:34.321 TM1.Server TM1ServerImpl::Destroy: Commit changes
7336 [] INFO 2015-07-08 02:15:34.992 TM1.Server Server shutdown
Maybe this is somthing significant?
Re: Loss of Whole Data Directory
Posted: Thu Aug 13, 2015 2:31 pm
by BrianL
I've never heard of the data directory becoming empty. Was the server shutdown expected? If not and you're running TM1 as a service you could check the MS event viewer for Windows logs on why the service shutdown.
Re: Loss of Whole Data Directory
Posted: Thu Aug 13, 2015 2:34 pm
by declanr
Blame all your colleagues for pressing delete and see which one cracks first.
Re: Loss of Whole Data Directory
Posted: Thu Aug 13, 2015 2:41 pm
by jim wood
Brian,
In the event viewer we have no critical errors. We have the following error at 7pm:
Code: Select all
tm1sdx64 error: 0
E16) Cannot connect to ODBC data source "TELEDB14_MCPR" IM002[Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified.
Then at 7.10pm we get this:
Code: Select all
tm1sdx64 error: 2
Data directory not specified. Aborting server start up.
Nothing in between that mentions the service coming down,
Jim.
Re: Loss of Whole Data Directory
Posted: Thu Aug 13, 2015 2:57 pm
by BrianL
How about the "System" logs in the "Windows Logs" folder? Anything from "Service Control Manager"? I'd expect to see (at least) a message indicating the service entered the stopped state.
Re: Loss of Whole Data Directory
Posted: Thu Aug 13, 2015 3:20 pm
by gtonkin
Hi Jim, is the data directory on the same device as the logs, not network attached / a symbolic link / junction etc?
I have seen something similar where we used a SAN and the SAN connection disappeared.
The only other thing that comes to mind is a script Task Scheduler or similar that may have truncated.
Re: Loss of Whole Data Directory
Posted: Thu Aug 13, 2015 3:26 pm
by jim wood
BrianL wrote:How about the "System" logs in the "Windows Logs" folder? Anything from "Service Control Manager"? I'd expect to see (at least) a message indicating the service entered the stopped state.
Brian,
I couldn't find anything in the log directory.
Re: Loss of Whole Data Directory
Posted: Thu Aug 13, 2015 3:28 pm
by jim wood
gtonkin wrote:Hi Jim, is the data directory on the same device as the logs, not network attached / a symbolic link / junction etc?
I have seen something similar where we used a SAN and the SAN connection disappeared.
The only other thing that comes to mind is a script Task Scheduler or similar that may have truncated.
We are indeed on SAN storage. I've asked our server guy to see if there were any connection issues last night,
Jim.
Re: Loss of Whole Data Directory
Posted: Thu Aug 13, 2015 7:44 pm
by Alan Kirk
jim wood wrote:We came in this morning to our UAT service down and the whole data directory empty.
...
I have seen one post on here with something similar but no real help. Have any of you guys seen anything like it? We can't seen any reason why it would happen,
The only time I've seen that happen is
here, but you, unlike IBM, probably know better than to store data files in the Program Files path. That being the case the SAN issue suggested by gtonkin would seem to be the more productive line of enquiry.
Re: Loss of Whole Data Directory
Posted: Fri Aug 14, 2015 12:02 pm
by jim wood
Alan Kirk wrote:The only time I've seen that happen is
here, but you, unlike IBM, probably know better than to store data files in the Program Files path.
This was a setup I inherited. Thankfully this was one thing the previous owners got right. I mean one thing.
Alan Kirk wrote:That being the case the SAN issue suggested by gtonkin would seem to be the more productive line of enquiry.
We've asked our server team to check this out. They haven't spotted anything so far but it could have been caused by only a brief connection issue. This is not a production server so less attention is paid to it. It does (btw) sound like the most likely cause. I don't know how much backside covering is happening within the server team when they say they haven't found anything. Only time will tell I guess.
Thanks for all your input guys. When I get anything back from IBM I'll post it here for future reference,
Jim.
Re: Loss of Whole Data Directory
Posted: Mon Aug 17, 2015 10:01 pm
by Steve Rowe
Hi Jim,
Just to say in my last role we were running prod / uat / dev for 30 plus instances on SANs and not once did I hear of the DD going totally AWOL in the few years I was there.
If the SAN had dropped off for some time I can't see how the whole DD would get deleted, at worst the instance would be unable to write back to the DD and then fall over or you would end up with a bunch of dot $ files or similar? You could test the behaviour of TM1 when the DD goes missing by just running up an instance and deleting / moving the DD and see what messages TM1 produces.
IMO the most likely explanation is that someone deleted the DD in error and is keeping quiet.....In the absence of any evidence that there was a technical issue I'd be shrugging my shoulders and moving on and maybe look at the security on the infrastructure.
Cheers,
Re: Loss of Whole Data Directory
Posted: Tue Aug 18, 2015 12:17 pm
by jim wood
Steve Rowe wrote:IMO the most likely explanation is that someone deleted the DD in error and is keeping quiet.....In the absence of any evidence that there was a technical issue I'd be shrugging my shoulders and moving on and maybe look at the security on the infrastructure.
Normally I'd think the same but only the files stated in the log are missing. The view folders etc are still there. The only file deleted that wasn't in the log was the CFG file.
Re: Loss of Whole Data Directory
Posted: Tue Jun 09, 2020 4:08 pm
by gtonkin
Jim, did you ever get a resolution or a better understanding of why you got the message?
We also recently received a similar message to yours:
Code: Select all
19880 [284a4] ERROR 2020-06-09 04:48:54.599 TM1.Server.Network net_SetReadBufferAt: Attempted to set position past received network data size.
This message seems to repeat about 20 times in the same instant i.e. at 2020-06-09 04:48:54.599.
We have filled up our 20 TM1 server logs of 100MB each and it is now logging and rolling the logs.
Not sure about any data loss at this stage but server is still running.
Have logged a change request to have the physical server rebooted but just hoping to hear of anything that could be checked before the reboot.
Server is still 10.2.2 FP4 (migration to PA imminent)
If anyone else has any insight, would appreciate.