Running a TI only if the data has changed

tosca1978 · Post by **tosca1978** » Tue Apr 02, 2013 7:01 am

Hi all. I am on v10.1...

Apologies for the long e-mail. If you haven't got time to read it all skip to the question right at the bottom! If you have got time you might be able to point out if I'm asking the wrong question - like a design fault perhaps.

Setting the scene:

I have 2 cubes - an input cube and a reporting cube. The input cube only has 3 dims - version/geographic area/measures. The in-putters are global. There is element level security on all three dimensions - so in-putters can only input against the "Current Month" version, only have write access to their geographic area, and the ownership of the elements within the measures dim are split between 2 user groups (although both groups need READ access to other groups elements.

Some of the measures are string (some free text but most pick lists to categorise data), whilst others are numeric.

The reporting cube has 9 dimensions. Some of the measures in the input cube (the string measures that have pick lists against them) are turned into dimensions in the reporting cube. For example, in the input cube a measure called Project Manager has a pick list in it containing a list of Project Managers. In the Reporting cube, there is a dimension for project manager. The main reason for this is because there are certain measures (from the input cube) that end users wanted to be able to filter on in the BI reporting that sits on top of the reporting cube - for example all data relating to one project manager. I am a TM1 developer, not a BI report developer, but when the BI developer and I worked together to get the BI reporting off the TM1 reporting cube, it became apparent that it is difficult to filter string measures in BI (it expects a measure to be numeric). The only way around this was to dimensionalise any measure that the users wanted to filter in the reports. This all works well and the BI reporting from the TM1 reporting cube performs very well (when all values are static).

My problem - how do I get data from input cube to reporting cube?

Rule

Originally I used a rule to get the data from the input cube to the reporting cube - purely because "real time reporting" was a requirement. The dimension order of the cube was optimal and the rule derived content was fed efficiently (in my opinion - perhaps it could have been even more efficient though). However, the BI reporting didn't perform fast, due to the calc time of the cube. I reasoned that since the data wouldn't change that often - the users are suffering a performance hit for real time data unnecessarily.

TI - version 1

I changed from a rule to a TI to get the data from the input cube to the reporting cube. Because none of the cells were rule derived/all static values in the reporting cube, the BI reporting off the cube performed very well - much faster.

The first version of this was based on the following:

1. End user inputs data into input cube in TM1 web
2. Clicks action button that runs a TI (and then waits for it to complete)
3. TI does the following:
a) Deletes all numeric and string data in the reporting cube for geographic area that end user is inputting data for
b) takes data from the input cube to the reporting cube for the geographic area that end user is inputting data for

This didn't work for the following reasons:

1. It took a long time - roughly 30 seconds (because the TI is run on demand as the user clicks the action button, anything more than 5-10 seconds is not really acceptable). CubeClearData couldn't be used because the user would not have sufficient security to copy all of the input cubes data into the reporting cube. CellPutProportional spread could be used on numeric data, but the code to clear out string data uses a while loop - which takes longer.
2. The user would never have the correct security access to clear out all data in the measures dim of the reporting cube due to the element security needed.

TI - version 2

The second version of this was based on the following:

1. End user inputs data into input cube in TM1 web - doesn't have to run a TI so no need to wait for it to complete.
2. a chore runs every 2 mins to run a ti that does the following
a) CubeClearData
b) takes ALL data from the input cube to the reporting cube

This runs very quickly since the CubeClearData operates faster than clearing out data piecemeal like the first version of the TI. It takes about 10 seconds to complete. The end user only has to wait a maximum of 2 mins to see latest changed data in the reports - so is practically "real time reporting".

However - 95% of the time the TI doesn't need to run because no data has changed. The logs quickly build up which means the SaveDataAll chore that I have running every 30 mins takes a longer time.

So the question is:

How can I make it so that the TI only runs if the data has changed in the input cube?

I know that I cannot interrogate the current transaction log as it is in use, but I could interrogate the last transaction log after a SaveDataAll has been completed. Not sure that would help since the time lag it would introduce for "real time reporting".

Perhaps I need to back to a rule and suffer the performance hit on the BI reporting side.

Thanks for anyone who has taken the time to read the above and has any suggestions would be most welcomed.

Kind regards.

lotsaram · Post by **lotsaram** » Tue Apr 02, 2013 7:59 am

My suggestion would be to use RunTI.exe which is included with 10.1 The solution would be to segregate the VALIDATION of the parameters for clearing data in the reporting cube and loading of new data from the actual EXECUTION. This is done by calling RunTI via command line batch file and passing in the already validated parameters. As parameters are validated you don't have to wait for the ExecuteCommand and can "release" the calling TM1 web sheet immediately so the user gets good perceived performance while the job in effect executes in the background.

Basically you already have the TI to be run by RunTI as your "version 1" where the clearing and loading is already limited to the relevant cube area - you just need a separate calling and parameter validation process and change the websheet action button to use the new process instead.

And BTW thanks for the effort in describing the situation and laying out what has already been done and what the actual problem is!

Martin Ryan · Post by **Martin Ryan** » Tue Apr 02, 2013 8:02 am

I feel I should write a detailed response, given the excellent outline of your problem (we should reference it as an example in the Request for Assistance Guidelines!) but there's a simple answer: check the totals in the two cubes and see if they match. If they don't, run the TI. If they do, then no need to run. I'd still run the load every half hour or so regardless of whether this check says it's required, just to be on the safe side.

Martin

lotsaram · Post by **lotsaram** » Tue Apr 02, 2013 8:55 am

Martin Ryan wrote:I feel I should write a detailed response, given the excellent outline of your problem (we should reference it as an example in the Request for Assistance Guidelines!)

Agree wholeheartedly, especially given the standard of some questions lately.

Martin Ryan wrote:but there's a simple answer: check the totals in the two cubes and see if they match. If they don't, run the TI. If they do, then no need to run. I'd still run the load every half hour or so regardless of whether this check says it's required, just to be on the safe side.

That's a possibility but I tend to disagree. If the issue with end use driven data loading to specific intersections was user wait time while the operation completed then this can be quite easily gotten around with RunTI so the user gets an immediate success or failure message while the process itself completes in a background session. A background sync process could still run with much lesser frequency to mop up in case anything went wrong but I would see this as more of a fallback design option.

Either way the current design of frequent background processing of the entire cube data is not a good approach and either targeted background processing as you suggest or restoring user experienced performance of end user driven processing would both be better than the current solution.

whitej_d · Post by **whitej_d** » Tue Apr 02, 2013 2:14 pm

It's probably not the way you want to go anymore, but it is possible to use string measures for filtering in BI, but you need to rule them as attributes, then the reports and corresponding filters can be written using the attributes instead of the string measure.

Martin Ryan · Post by **Martin Ryan** » Tue Apr 02, 2013 8:35 pm

tosca1978 wrote: 1. It took a long time - roughly 30 seconds (because the TI is run on demand as the user clicks the action button, anything more than 5-10 seconds is not really acceptable). CubeClearData couldn't be used because the user would not have sufficient security to copy all of the input cubes data into the reporting cube. CellPutProportional spread could be used on numeric data, but the code to clear out string data uses a while loop - which takes longer.
2. The user would never have the correct security access to clear out all data in the measures dim of the reporting cube due to the element security needed.

I missed this last night (reading too late) and just wanted to correct you on something. When a TI process is run, it runs with full administrative privileges, regardless of who kicked the process off. This is so (for example) you can get the user to add a new element to a dimension, which normally they couldn't do. So you need to be very careful when you give users read access to TI processes.

So with respect to your specific problem there's no reason why the cube clear couldn't be done by the user. However I think the reason that your first process was slow was actually the interaction of web to TM1 and back. Java is notoriously slow. I've not used RunTI so not sure if lotsa's suggestion will get around this particular contributor of slowness, but certainly it's worth a try.

Lotsa: I agree that a full cube clear and reload is not the purest way of doing things, however it might be the most pragmatic as it's dead easy to setup when compared to doing a targeted reload, and if you have the logging turned off for the reporting cube, then there's not really much of an overhead. Having said that, I'd probably go with a targeted approach myself unless the cube was very small. It's just a matter of how much effort tosca wants to put in.

mattgoff · Post by **mattgoff** » Wed Apr 03, 2013 2:40 am

lotsaram wrote:
Martin Ryan wrote:but there's a simple answer: check the totals in the two cubes and see if they match. If they don't, run the TI. If they do, then no need to run. I'd still run the load every half hour or so regardless of whether this check says it's required, just to be on the safe side.
That's a possibility but I tend to disagree. If the issue with end use driven data loading to specific intersections was user wait time while the operation completed then this can be quite easily gotten around with RunTI so the user gets an immediate success or failure message while the process itself completes in a background session. A background sync process could still run with much lesser frequency to mop up in case anything went wrong but I would see this as more of a fallback design option.

Either way the current design of frequent background processing of the entire cube data is not a good approach and either targeted background processing as you suggest or restoring user experienced performance of end user driven processing would both be better than the current solution.

Sounds like the OP's TI v2 is getting the job done, but the problem is now log volume.

tosca1978 wrote:However - 95% of the time the TI doesn't need to run because no data has changed. The logs quickly build up which means the SaveDataAll chore that I have running every 30 mins takes a longer time.

(I acknowledge there's a bolded, underlined, blue sentence prefixed with "the question is" below this quote, but I'm going to dismiss it because I suspect what the OP really wants (triggers) is not possible.)

I think the OP misunderstands how SaveDataAll works though. SaveDataAll doesn't process the tx logs into .cub files, it writes new .cub files from whatever is currently in memory. SaveDataAll time is directly correlated to the size of stored data (non-ruled) in the cube and the performance of the disk subsystem, not the number of unsaved transactions. Save time on a cube is exactly the same if you change one or 1M cells. Also, there's no reason to run a SaveDataAll every 30 minutes as long as you're logging transactions. TM1 will transparently recover from a server crash by replaying the TX logs (admittedly more slowly if the uncommitted log volume is large). Nightly SaveDataAlls would be fine and work around the problem. Even if recovery time is excessive, crashes should be a rare enough problem that it's worth the trade-off.

But, assuming the issue is as the OP states, it sounds like the problem to be addressed is to reduce tx log size. If we're going with a scheduled process (or lotsaram's asynchronous suggestion), interactive performance isn't important. If this is true, I have a v3 option which adds process time at the savings of log entries.

I've actually been thinking about this problem (unnecessary log entries), but haven't implemented anything yet. I've often griped about replication-- it's a little flaky from time-to-time, but a key issue we have is performance. Each planet reps every hour, and it takes 2-3 minutes depending on size of country (we have a substantially larger presence in some countries) and network performance. That means that as much as 5% of my server's availability is lost due to replication (and a big source of user frustration since we're on 9.5.1 which lacks Parallel Interaction). A significant reason for the rep duration is that I also do a single-period wipe/reload of our GL on the star every hour. That rep duration could be substantially reduced if I could do a targeted refresh instead of the whole period.

The key is a way to skip the ViewZeroOut prior to data load. In short, the normal GL refresh process would, in the metadata tab, set a flag to indicate that a balance was present in source during the last refresh (these flags would have to be cleared in the Prolog, but logging of flag-related txs would not be necessary). After the data is loaded, a second process would traverse the cube and zero any cells where the flag is not set. Since TM1 is smart enough to only write a tx log if a cell changes, simply over-writing a cell with the same data is faster than doing a CellGetN/if/then and results in the same number of log entries.

But, back to the OP, I think the best solution is just to move SaveDataAlls to nightly and call it a day.

Matt

Martin Ryan · Post by **Martin Ryan** » Wed Apr 03, 2013 3:12 am

There's also CubeSetLogChanges('MyCube', 0); which turns logging off for the given cube. If the data comes from a replicable source (like another cube or a database) instead of a person then it's often a good idea to simply turn logging off. It prevents log file bloat and it also makes the TI process run a bit faster.

However if replication is involved then of course this isn't an option, as the logs are required for the replication.

tosca1978 · Post by **tosca1978** » Wed Apr 03, 2013 9:54 am

Thanks very much to everyone for your replies - it's really appreciated.

lotsaram wrote:My suggestion would be to use RunTI.exe which is included with 10.1 The solution would be to segregate the VALIDATION of the parameters for clearing data in the reporting cube and loading of new data from the actual EXECUTION. This is done by calling RunTI via command line batch file and passing in the already validated parameters. As parameters are validated you don't have to wait for the ExecuteCommand and can "release" the calling TM1 web sheet immediately so the user gets good perceived performance while the job in effect executes in the background.

This makes sense - user doesn't feel the pain of waiting for the data to be cleared and reloaded....

Martin Ryan wrote:there's a simple answer: check the totals in the two cubes and see if they match. If they don't, run the TI. If they do, then no need to run. I'd still run the load every half hour or so regardless of whether this check says it's required, just to be on the safe side.Martin

This is a good idea. The only problem is that it may be string data that has changed since the last time the TI ran - for example the user changes the project manager measure. The data in the repoting cube would need to be deleted from PM a and reloaded to PM b since Project manager is a dimension.

whitej_d wrote:It's probably not the way you want to go anymore, but it is possible to use string measures for filtering in BI, but you need to rule them as attributes, then the reports and corresponding filters can be written using the attributes instead of the string measure.

Thanks. We've got the reporting into a good place so don't think we will change things now. However, I did exactly as you said and the BI report developer had a terrible time trying to work with my attributes. He could only see 4 out of 10 and even then struggled to utilise them!

Martin Ryan wrote:I missed this last night (reading too late) and just wanted to correct you on something. When a TI process is run, it runs with full administrative privileges, regardless of who kicked the process off

Thanks Martin - I didn't know that. I guess with a targeted approach this doesn't help me much...if the user runs a TI that has CubeClearData in the Prologue but is only going to reload data for his geographical area of the cube in the data tab. However, I presume this means that if the data tab was to load all of the geographical hierarchy (even areas that the user does not have write access to) they would load succesfully since the TI is run as admin? This changes things a bit...

mattgoff wrote:But, back to the OP, I think the best solution is just to move SaveDataAlls to nightly and call it a day.

Matt - I got a little lost in what you were suggesting with replication - but thanks for setting me straight on SaveDataAll...

Martin Ryan wrote:There's also CubeSetLogChanges('MyCube', 0); which turns logging off for the given cube. If the data comes from a replicable source (like another cube or a database) instead of a person then it's often a good idea to simply turn logging off. It prevents log file bloat and it also makes the TI process run a bit faster.

Martin - again very helpful - and really I should have thought of this sooner.

So, piecing all of your advice together I have a plan...

1. Turn off Cube logging for the reporting cube since the only logging I need is on the input cube - and will reduce transaction logs
2. Add SaveDataAll to a nightly admin chore as opposed to every 30 mins
3. Delete the chore that runs every 2 mins to clear and reload data to reporting cube
4. Add a TI to action button that captures parameters for RunTI.exe - user feels no pain of execution
5. Call RunTI.exe via command line batch file

The other approach I was thinking of was replacing steps 4 and 5 with:

4. Add a TI to action button that sends a "1" trigger to a one cell cube - should be very quick for end user
5. Re-instate my 2 min chore to clear and load data based on whether there is a "1" in the trigger cube and delete in the epilogue

Only because I have never set up the RunTI.exe before so not sure how complicated this is.

If you do get a chance to feed back on my plan I'd appreciate it - but if not then thanks for all your help - it's been much appreciated.

Cheers

lotsaram · Post by **lotsaram** » Wed Apr 03, 2013 12:26 pm

Your plan seems sound to me.

Writing to a flag cube and having a regularly scheduled background job would also work for targeted clear and load - just remember to clear the flags. But RunTI is not that difficult to set up, IBM actually has surprisingly good documentation on setting up RunTI, the problem is only in finding the documentation.

kugors · Post by **kugors** » Wed Apr 03, 2013 2:05 pm

Maybe leave decision of processing current intersection to operators. You could add new measure (for example "Process"), and tell users that they have to put 1 in it, when they want to update reporting cube. Then yours TI could work only for intersections with 1 (or <>0), send data to reporting cube and put zero to "Process". That kind of TI will work fast, because it will affect only changed data.

TM1 Forum

Running a TI only if the data has changed

Running a TI only if the data has changed

Re: Running a TI only if the data has changed

Re: Running a TI only if the data has changed

Re: Running a TI only if the data has changed

Re: Running a TI only if the data has changed

Re: Running a TI only if the data has changed

Re: Running a TI only if the data has changed

Re: Running a TI only if the data has changed

Re: Running a TI only if the data has changed

Re: Running a TI only if the data has changed

Re: Running a TI only if the data has changed