Best practice for creating view for dimension

GregaJ · Post by **GregaJ** » Tue Sep 06, 2011 6:36 am

Hi,

we are updating daily two dimensions in budgeting process, because business is opening new products every day for budgeting purposes and my question is what is the best practice to have updated sorted view (it has 3 levels) after each loading.

We currently use dynamic expression for view to have it always updated, but I am thinking that it would probably be better to update my view in epilog after updating the dimension to have a static set in terms of performance?

So is it better to have an update view logic as a last step when updating dimensions or do you use some other logic?

Thank you

Grega

rozef · Post by **rozef** » Tue Sep 06, 2011 8:45 am

GregaJ wrote:I am thinking that it would probably be better to update my view in epilog after updating the dimension to have a static set in terms of performance

No, MDX subsets are always faster then static subsets except for one case: if you update a dimension with an MDX subset pointed on it as entrie, your subset will be reevaluate for each modifications and it will be very ineficient.

David Usherwood · Post by **David Usherwood** » Tue Sep 06, 2011 10:59 am

Interesting.
At Black Belt last year the product team were pretty strong on not using dynamic subsets too much - the rationale being that _any_ change to _any_ data on the whole server caused a dynamic subset to be re-evaluated.

lotsaram · Post by **lotsaram** » Tue Sep 06, 2011 11:14 am

rozef wrote:No, MDX subsets are always faster then static subsets except for one case: if you update a dimension with an MDX subset pointed on it as entrie, your subset will be reevaluate for each modifications and it will be very ineficient.

My recollection is exactly the same as David's. I would have to disagree and say that dynamic subsets or MDX in derived member sets in general are almost always slower then retrieving a static list. For most subsets the difference is probably not perceptible or not enough to matter but for large dimensions or large subsets the difference can sometimes be very significant.

I try to avoid using public named dynamic subsets in any high concurrency application and never use them in TI but I'm not opposed to using them in general as they certainly can be useful.

To get back to GregaJ's question. For me it would depend on the nature of the application and if it is a very large dimension whether there are any performance issues with the dynamic subset. If the TI update runs nightly then a static subset generated at the conclusion of the update will always be up-to-date. I would lean to this for a "large scale" application. But if the dimension is not that big and you have not that many users and no locking issues with dynamic subsets then I would question whether this would be worth the extra effort.

rozef · Post by **rozef** » Tue Sep 06, 2011 12:48 pm

David Usherwood wrote:At Black Belt last year the product team were pretty strong on not using dynamic subsets too much

Interesting too, I learned it from an expert who just pass this certification, but maybe he changed his mind after.

For my experience, I always use MDX statements for TI's process cause it is a lot faster then normal subsets when you are using a cube view as data source, even with only one element inside. But this is temporary subsets that I destroy in epilog, so they can not be reevaluated. In this case, it's faster to build dynamic subsets then making a loop on dimensions, furthermore the speed of data reading looks faster too.

You are certainly right that it can slow your server with any_change, but my opinion is you really have to got a huge quantity of metadata and dynamic subsets to feel the difference. For exemple, the trainer of Black Belt I know is using 200+ Go RAM server.

lotsaram · Post by **lotsaram** » Tue Sep 06, 2011 2:20 pm

rozef wrote:For my experience, I always use MDX statements for TI's process cause it is a lot faster then normal subsets when you are using a cube view as data source, even with only one element inside. But this is temporary subsets that I destroy in epilog, so they can not be reevaluated. In this case, it's faster to build dynamic subsets then making a loop on dimensions, furthermore the speed of data reading looks faster too.

You are certainly right that it can slow your server with any_change, but my opinion is you really have to got a huge quantity of metadata and dynamic subsets to feel the difference. For exemple, the trainer of Black Belt I know is using 200+ Go RAM server.

Your experience would certainly go against the collective experience of most if not all of the experts posting on this forum. Dynamic subsets should always be avoided as data sources due to the fact that they have the tendency to get reevaluated on each record during processing (if any meta data or data change is done, i.e. anything beyond a text output) and are therefore generally MUCH MUCH slower than using static subsets. For similar reasons it is often much faster to do a process in 2 steps to output a view to file and then process the file rather than use a view as a direct data source.

I would be happy to be proven wrong. In fact I would be ecstatic as this would simplify much development. But I think you should check your facts as I believe you are wrong on this one.

Mike Cowie · Post by **Mike Cowie** » Tue Sep 06, 2011 2:53 pm

rozef wrote:
David Usherwood wrote:At Black Belt last year the product team were pretty strong on not using dynamic subsets too much
Interesting too, I learned it from an expert who just pass this certification, but maybe he changed his mind after.

Or, maybe you and/or the expert misunderstood the advice from the training? After all, you don't need 100% on the Master-level exam in order to pass it.

I believe I was at the same training session referred to by Mr. Usherwood and there are definitely a number of reasons to be cautious with dynamic subsets, particularly where locking and multiple users on a TM1 server are concerned. Dynamic subsets are definitely a great feature, but as with most things there are tradeoffs when using them under a number of circumstances.

rozef wrote: For my experience, I always use MDX statements for TI's process cause it is a lot faster then normal subsets when you are using a cube view as data source, even with only one element inside. But this is temporary subsets that I destroy in epilog, so they can not be reevaluated.

Yes, you may not see as much of a performance hit if you use dynamic subsets in your TI process, but this approach does have a higher likelihood of causing unwanted (and avoidable) locks on your TM1 server that could disrupt performance for other users while your TI process runs. Not to mention the potential for dynamic subsets to get recalculated when *anything* on the TM1 server changes.

rozef wrote: In this case, it's faster to build dynamic subsets then making a loop on dimensions, furthermore the speed of data reading looks faster too.

Faster to build a dynamic subset than a static subset - yes, I would agree that writing the TI script to do the same kind of operations that you can do very simply with a good subset MDX expression is more challenging. But, you can very easily generate static subsets from dynamic subsets via a generic TI process that writes the elements from the dynamic subset to a text file then uses that as a data source to populate a static subset. In fact, this was an approach recommended at the same Black Belt training mentioned by David for generating static versions of dynamic subsets.

However, I question your conclusions about "speed of data reading". I'd like to see hard evidence that dynamic subsets perform better than static subsets backing up these kinds of statements - it runs contrary to anything I and many others have seen and heard with regards to dynamic vs static subsets. Quite frankly, saying that a subset based on a dynamic expression is faster than a subset based on a static, saved list just doesn't make sense - it's like saying the extra effort of always calculating the dynamic subset is somehow more efficient than already knowing the result!

rozef wrote: You are certainly right that it can slow your server with any_change, but my opinion is you really have to got a huge quantity of metadata and dynamic subsets to feel the difference.

Again, it's not just about the performance of your specfic TI process' operations and whether or not you can measure the speed difference of a discrete operation. It's what you could be doing to other users who also need to use the TM1 server while your TI process is running. If you have very few concurrent users using the TM1 server and/or your TI processes run very quickly (in seconds) then this may not be a concern for you and you can count yourself among the lucky. It is, however, a real concern in the design of many real world TM1 applications.

rozef wrote: For exemple, the trainer of Black Belt I know is using 200+ Go RAM server.

Yes, 200+ GB RAM for a TM1 model is a large TM1 model. I'm not trying to single out your expert here, but for me creating a 200+GB TM1 model doesn't mean anything. Good TM1 design and an understanding of TM1 fundamentals doesn't have an optimal RAM footprint. It's very easy with sub-optimal feeders and poor cube design to use up 100's of GB's of RAM. Does that mean all large TM1 models are poorly designed? Of course not, but model size alone isn't a credential in my opinion.

Regards,
Mike

rozef · Post by **rozef** » Tue Sep 06, 2011 2:58 pm

Edit:
Alright, maybe I get lucky because I tried it many times and it have been always faster with MDX then without.
I will try it more harder if I have time for this.

By the way, I still think these cases of slowing down can only happend in large servers or with many interactions because I never heard about it.

@lotsaram: You are always so sweety when you explain something to me, it is a real pleasure.

Cheers.

Mike Cowie · Post by **Mike Cowie** » Tue Sep 06, 2011 7:42 pm

rozef wrote:Edit:
Alright, maybe I get lucky because I tried it many times and it have been always faster with MDX then without.
I will try it more harder if I have time for this.

By all means, please share any evidence to the contrary...

I felt like killing time over lunch and ran some really simple tests via TI to try and re-validate some of what's been raised here (since we're already way off topic).

My data (totally made-up sample data set - designed to be fairly large in order to make sure any tests/timing could be measured):
- Simple cube with 4 dimensions: Scenarios (3 members), Time Periods (12 members), Centers (2,600 members), Accounts (400,000 members).
- One scenario in the cube is populated with about 62,000,000 values
- The account dimension has a subset, based on a filter by expression, that results in 400 members in the subset. A version of that subset is saved as a public, dynamic subset and another is saved as a public, static subset. Both are ultimately identical in terms of the members they contain.
- All tests done in TM1 9.5.2 (no hotfixes applied, I don't think)

First test: Create a new dimension and populate an attribute in it based on a TM1 Subset data source
I realize you and others know this to be a known problem with dynamic subsets as data sources, but wanted to quantify it.

Populate a new dimension and an attribute in that dimension from my Static subset: 0 seconds (couldn't really measure the time).
Populate a new dimension and an attribute in that dimension from my Dynamic subset: 10-11 minutes

Not too surprising if you've ever run into this before - incidentally, it appeared that updating the attribute value (a data change) was primarily the issue because if I remove that and only add elements to the new dimension (a metadata change) the execution times between static vs dynamic are similar. I'm not sure I had made that distinction before since I've long since abandoned ever using dynamic subsets as data sources, but thought I'd mention it.

Second test: Copy data in one scenario to a different scenario based on views using a subset to filter data being exported
In this case, I decided not to even bother dynamically creating/destroying views or subsets (which have the potential to cause additional server/object locking issues, incidentally) - I just used one of two public views:
1) View export based on the earlier static subset for a static scenario.
2) View export based on the earlier dynamic subset for a static scenario.
In each process I had a view that zeroed out the target scenario (all static subsets), which was completed in the Prolog.

The results (repeated 5 times):
Copy scenario data using export view with Static subset: 30-31 seconds each run
Copy scenario data using export view with Dynamic subset: 31-33 seconds each run

So, this obviously isn't the most scientific test and testing equipment (my own computer) in the world, but at the very least there's nothing here that would suggest the dynamic subset approach is more efficient/faster. NOTE: Any time measurements here were gathered by capturing a time stamp in the Prolog and another in the Epilog and writing that out via ASCIIOUTPUT to get a start/end time.

rozef wrote:By the way, I still think these cases of slowing down can only happend in large servers or with many interactions because I never heard about it.

Just because you haven't heard about something doesn't mean it isn't true or uncommon.

And I think this is why you've gotten some grief here (albeit maybe not always in the nicest of tones, I'll grant you, but then you have been pretty strong in your assertions to the contrary). I, and I'm sure David, Lotsa and a number of others on the TM1 forum have worked with and come across many models that, as developed, work wonderfully with one or a couple users at any given time. Not all of these models are not extremely intricate or large, but once you get any significant number of concurrent users hitting that model and updating values (especially), reading views at a consolidated level, running TI processes, etc at the same time that performance can go downhill very quickly if you don't design your TM1 model to avoid some of those potential pitfalls.

And this is the crucial point here - if you think only about how an individual process runs in isolation, you are ignoring a bigger part of the overall performance picture in many TM1 applications. Saying that "MDX subsets are always faster then static subsets" is a dangerous (and until proven otherwise, false) statement without the proper context for less-informed users reading the forum. There are broader implications with regards to server/object locking, in particular, when you are dealing with things like high concurrency planning applications where the choice of dynamic vs static subsets can be a crucial one in designing certain application elements. In the case of the original poster and in many smaller, or more static reporting-type/read-only, TM1 applications, the choice of dynamic vs static subsets may be of little consequence. I use dynamic subsets all the time because they are incredibly useful and efficient to create, but often there are situations where I'm trying to squeeze out better performance/improve concurrency and I may adjust how dynamic subsets are being used in that application. I wish that weren't the case, but it is and I don't think you'll find too many at IBM or on this forum that will disagree.

Regards,
Mike

paulsimon · Post by **paulsimon** » Tue Sep 06, 2011 8:59 pm

Hi Grega

The original question has sparked some quite interesting debate. Many thanks to Mike for running and publishing those tests. It is good to have TM1 folklore quantified.

I also agree with Mike's point that people should not be afraid to use dynamic subsets. They are very useful, and there are some cases where the dynamism is necessary. In small applications they are fine. However, when you get to the bigger apps with dimensions that have a lot of elements then you need to be careful about locking. Having said that locking is not necessarily a problem if the cube is only updated overnight when no one is using it. The application that I am currently working on is 40GB with tens of thousands of elements in some dimensions, and it is updated potentially every minute, 24 hours a day, while reporting is going on, so locking has been a major consideration.

One suggestion I have for the original question is that if:

1) You have the dimension that is being updated on rows (and I guess columns though I haven't tested ) but not in the title area (+), and you have the 'All' subset selected, ie you click the show all elements before you save the view
2) You have the dimension sort order set to the order that you want (*)

Then, when you insert an element into the dimension it will automatically appear in the View in the right place. There is no need to update any Subsets or the View.

(+) It still works in terms of updating the dimension if the dimension is in the title position in the View but the gotcha is that if you insert an element that gets sorted before the one currently in the title position in the view then all the elements get shifted down one and your view will show a different title element being selected. However, my guess if that your view has the dimension on rows. This is because the View internally uses the element position in the .vue file rather than its name. It would be nice to see IBM implmenting the same Name or Position option for Views that they have for Subsets.

(*) I find the best way to specify the physical sort order for a dimension is to CellPutS into the }DimensionProperties Cube rather than using DimensionSortOrder, as it is more reliable and seems to work every time rather than just when the dimension is empty.

Regards

Paul Simon

lotsaram · Post by **lotsaram** » Wed Sep 07, 2011 2:24 am

I also decided to do my own experiment. Maybe this is a little more scientific than Mike's, probably not that much, but it certainly is repeatable. I decided on 6 test cases to compare between processing performance of dynamic vs static subsets. The tests are in 3 pairs (meta data change to model, data change to model, no change to model). Everything was tested on 9.5.1. The tests are:
- copy dynamic subset to static subset
- copy static subset to static subset
- process dynamic subset and update attributes
- process static subset and update attributes
- output dynamic subset to file
- output static subset to file

: DynamicSubTest.jpg (75.46 KiB) Viewed 9543 times

The dimension I used was sizable (100K leaf elements with 10 levels) but definitely not huge by any means. The MDX expression relatively simple but giving the server some work to do:- recursive drilldown on top level element, filter for N elements then filter on member name to exclude parts of the dimension.

The results speak for themselves. Using dynamic subsets as data sources is a no go. I haven't done this for a long long time but I was surprised by the extent of the impact. I started out wanting to run tests with sizeable dynamic and static subsets with identical members, but quite honestly I gust gave up waiting for the process to ever finish with a sizeable dynamic subset as a data source and so ended up cutting the dynamic subset down to a level where I could be bothered waiting. If someone can come up with a case where it is faster to read from a dynamic subset I'd like to see it.

As expected if anything in the model is updated then processing performance with dynamic subsets barfs, processing speed on a simple file out is comparable but you still wear some additional overhead for evaluating the MDX versus reading straight from the static list. This combined with the performance difference in subset copy is a perfect example why it is best practice to output a dynamic subset to a file then use the file either for subsequent processing or to create a static subset. It may be more work but it is simply faster.

I think Mike also hit the nail on the head with the distinction between faster to code versus faster to run although with reusable code snippets and modular TI coding from bedrock I'm not sure I buy the argument that a dynamic subset is quicker to code than a static one. It's pretty clear to me that as a developer you have a responsibility to code an application that runs as fast and efficiently as possible, you also never know when more users, impact of locking or slowdown from evaluating rules and feeders might come to bite you ...

In the interests of education, scientific method, peer review and all that I have this available as a demo model so anyone else who wants to replicate this is welcome to do so. Just PM me and I will email it to you as I got the zip down to 2MB but still nowhere near the 512KB upload limit for attachments on this board. (If anyone would care to suggest a file sharing alternative to emailing let me know.)

rozef wrote:@lotsaram: You are always so sweety when you explain something to me, it is a real pleasure.

I am more than happy to do my best to explain concepts to those who are willing to listen and learn, like many people I would rather not waste my time on those who aren't. I'm not sure which camp you belong to, for now you get the benefit of doubt.

stephen waters · Post by **stephen waters** » Wed Sep 07, 2011 9:19 am

paulsimon wrote:Many thanks to Mike for running and publishing those tests. It is good to have TM1 folklore quantified.
Paul Simon

Don't want to (or rather I can't) add anything on the technical side but I have heard that IBM are very aware of the performance issues with dynamic subsets and are working to improve them. So, hopefully at some stage the folklore will be superseded though not sure when. Fingers crossed for teh next release!

TM1 Forum

Best practice for creating view for dimension

Best practice for creating view for dimension

Re: Best practice for creating view for dimension

Re: Best practice for creating view for dimension

Re: Best practice for creating view for dimension

Re: Best practice for creating view for dimension

Re: Best practice for creating view for dimension

Re: Best practice for creating view for dimension

Re: Best practice for creating view for dimension

Re: Best practice for creating view for dimension

Re: Best practice for creating view for dimension

Re: Best practice for creating view for dimension

Re: Best practice for creating view for dimension