Page 1 of 1
Replication
Posted: Fri Sep 03, 2010 11:27 am
by mce
Hi,
Due to the frequent locking issues that are causing TM1 server hang, in order to improve availabality of TM1 data, we added a new TM1 server to be a replica of the production one, through scheduled replication-syncronisation. Production Server is TX which is running under Admin Server AX. The replicated TM1 server TY is attached to both admin servers AX and AY. Since shared admin server is AX, replication happens through admin server AX.
When I run SyncronizeAll for replication process, although TM1Top displays replication user as idle in TX, nobody is able to loging or do anything in TX. I want to avoid this situation, and I want users to be able to continue using TX during replication process, especially when replication user is idle in TX.
- Is that possible at all?
- Assuming that users are not able to login to TX through AX because AX is busy for replication, if I assign TX to AY, and disconnect TY from AY to make only AY as shared admin server for replication, would this allow users of TX through AX connect and use TX during replication/syncronization process?
Any ideas or comments on this problem would really be appreciated.
Regards,
Re: Replication
Posted: Fri Sep 03, 2010 11:44 am
by lotsaram
Is the additional replicated server a read only reporting server?
If you don't need very frequent updates of the replicated server to mirror production in near real time (say you only need a daily or 2x daily sync) then you really don't need replication, you should be able to just copy the data directory after a save data on the main server and restart the replicated server.
In certain circumstances this can be less hassle and give better performance than rep sync.
Most of your problems are more than likely due to your TM1 version. 9.0.1 is not even a particularly great version of 9.0 - you have never explained why upgrading is not an option.
Re: Replication
Posted: Fri Sep 03, 2010 1:25 pm
by mce
Hi Lotsaram,
We need more frequent (like every 2 hour) updates for the replica server, which will be ready only to users for reporting. Therefore copying data folder and restarting server does not seem to be a good option.
Upgrade of TM1 server is scheduled to take place later. Until that time, we want the system operate as efficient as possible.
Regards,
Re: Replication
Posted: Fri Sep 03, 2010 4:58 pm
by mattgoff
mce wrote:When I run SyncronizeAll for replication process, although TM1Top displays replication user as idle in TX, nobody is able to loging or do anything in TX. I want to avoid this situation, and I want users to be able to continue using TX during replication process, especially when replication user is idle in TX.
- Is that possible at all?
No. Replication sets a global lock on the initiating server. You might be able to swap, however, so the master server initiates the replication with your reporting server.
mce wrote: - Assuming that users are not able to login to TX through AX because AX is busy for replication, if I assign TX to AY, and disconnect TY from AY to make only AY as shared admin server for replication, would this allow users of TX through AX connect and use TX during replication/syncronization process?
No, the reason users can't log in is because there's a global lock applied. TM1 cannot access security cubes to authenticate users. It has nothing to do with the admin servers.
I think you're jumping out of the frying pan and into the fire with this architecture. Replication really works terribly in TM1. I wouldn't recommend it to anyone. If if wasn't too late for us, I would buy one monster server and use RDP for our remote sites. We bought multiple servers to overcome TM1's WAN performance, but instead we suffer with failed replications, long locking times, a complicated security model, and painful administration (upgrading 9 servers in one go is no fun).
I would consider upgrading to a 9.1+ version of TM1 which has the more granular locking model. I presume your locking is due to TI and probably also specifically due to data loads. Once you're on a version with better locking (and the opinions on this vary of course), you can optimize your model accordingly (load into a "naked" cube and copy into live/linked cubes after data is local, divide cubes if user writes are the issue, etc).
Matt
Re: Replication
Posted: Fri Sep 03, 2010 6:38 pm
by mattgoff
Two things I forgot to add:
- Although it may look idle, it's probably not. TM1Top default refresh rate is every two seconds (and max rate is every second), so there's lot of time for work to be done but the server still appear "idle" when TM1Top pulls the status.
- Despite my negativity, there are a lot of things you can do to improve replication performance. The most important, probably, is to never copy subests and views. This can dramatically slow down a replication as they have to be checked every time and the replication message protocol is terribly inefficient with these objects (not that it's particularly efficient with any objects).
Matt
Re: Replication
Posted: Thu Sep 16, 2010 12:19 pm
by mce
Hi,
Thanks for the info.
I realized that when you replicate a cube manually, replication user accesses the source server only when reading data, which does not take much time even if it is a big cube.
Then after reading data from source server, it takes a long time for replication to finish if it is a big cube with heavy rules during which it does not access to source server and therefore does not lock the source server. However, when we run SyncronizeAll, it keeps the lock on the source server during the entire replication/syncronization, even after finishing reading data for individual cubes. This leads to a situation that running replication manually for each of the heavy cubes, and doing it by SyncronizeAll for other cubes. This is quite annoying!
Regards,
Re: Replication
Posted: Thu Sep 16, 2010 1:23 pm
by mattgoff
mce wrote:However, when we run SyncronizeAll, it keeps the lock on the source server during the entire replication/syncronization, even after finishing reading data for individual cubes.
Although I don't remember this happening for us on 9.0, this is definitely not the case with newer versions of TM1, probably due to the new locking model. I have seven planets replicating with my star-- if they locked the star during replication, star users would never get any work done!
Matt
Re: Replication
Posted: Fri Sep 17, 2010 5:31 pm
by mce
Another question:
Is the following statement correct about replication?
"Since SyncronizeAll process looks at transaction log files in the source (Star) server to check if there are any changes, it will not be able to recognize those data changes that happened through TI Processes that are not recorded in transaction logs. Therefore you will loose the reconciliation of data between Star and Planet servers as you transfer data to Star through TI processes."
Regards,
Re: Replication
Posted: Fri Sep 17, 2010 7:41 pm
by mattgoff
mce wrote:Is the following statement correct about replication?
"Since SyncronizeAll process looks at transaction log files in the source (Star) server to check if there are any changes, it will not be able to recognize those data changes that happened through TI Processes that are not recorded in transaction logs. Therefore you will loose the reconciliation of data between Star and Planet servers as you transfer data to Star through TI processes."
That's a little muddled at the end, but mostly right. Replication works by replaying and reconciling transaction logs from both servers. If a data change is not logged, it will not be replicated and the servers will become (silently) desychronized. So, for any replicated cubes, you must have cube logging enabled and you must never disable logging during a process writing to that cube.
This can create a lot of log file entries. We reload the current period of our GL cube every hour from Oracle. In order to allow this cube to be replicated, we have to log that load process. As a result, we have 2x log entries for every non-zero cell in that load (one to zero prior value via ViewZeroOut, one to load value via TI/ODBC). I've never traced the packets to see if TM1 is smart enough to only send the most-recently revised value or if the full log is completely replayed. However, we have had issues where replication was paused for a few days and the first replication was very ,very long, so my guess is no.
Matt