Page 1 of 1

Delete Elements Containing No Data in any Cube

Posted: Thu Mar 24, 2011 8:01 pm
by normanbobo
I have looked for a solution to this problem on this forum and other forums without success.

The question is simple: how can you delete the elements from a dimension which have no values posted to them in any cube.

The question is simple, but the factors in our situation are difficult:

1) Large cube: 16 dimensions (including the measure dimension). 17 meausures 3.5 GB + RAM (and growing).

2) Many attributes on dimensions: Several of the dimensions have very many attributes -- one dimension has over 70 attributes.

3) Many large volume data sources: 11 data sources used to load the dimensions (remaining dimensions manuall created). 9 data sources used to load the facts (cells). Across all data sources, we are loading 13.3 million rows of data (and growing). Note that these rows are generated in SQL statements with GROUP BY clauses, causing the data to be summarized up so that there is only one row of data per cell in the cube. (The underlying tables have hundreds of millions of rows in total.) It takes 3+ hours to load the cube cells.

4) We are already designing additional cubes which will re-use many of the same dimensions. Thus the process must handle testing each element to see if it is used in *any* cube.

5) We have a large number of dimensional elements in our dimensional reference tables for which there are no records in the fact data. The users have required that we delete elements which have no data.

6) We also have to distinguish between a "no data" and "zeros". There could be records which sum up to zero (e.g. reversing transactions across fiscal periods). We only want to delete elements with absolutely no data. To help handle this (and to help with control total validations), we maintain a record count measure which is incremented by one for each input record.

Options considered and rejected:

1) Using suppress zeros features in the interfaces is not an adequate solution for three reasons:

a) There are so many unused dimensional elemnts, to keep them around would cause all of the queries to be much slower.
b) The elements would appear in the user interfaces (in our case, Analysis Studio, but also in Excel and Perspectives). This would significantly clutter the interface, making it difficult for users to find the elements they need.
c) The users are not used to seeing the unused dimensional elements in reports and other tools. It actually caused them to doubt the quality of the system, just because we were displaying these unused elements.

2) Load the facts and the dimensions in the same process. Use queries which pull in all of the dimensional attributes with each row of fact data. The problem with this approach is the volume of data ... millions of fact rows requiring as many as 200-250 columns on each row to hold all of the attributes for all of the dimensions. We did not have the resources to build a database infrastructure to run such queries and were doubtful that the we could read that much data through ODBC interfaces in any reasonable period of time. Turbo Integrator is fast, but ODBC is not.

3) Build a list from each source fact table of the elements used in each dimension. For each dimension, merge the lists. Filter the dimensional loads to use on only the dimensional elements appearing in the facts. Our issue here was that most of the work would be done in ETL tools and/or tricky SQL queries (which would probably run forever). This solution was just too "kludgy." It had too many moving parts and too much complexity.

4) We tried this: Load all of the dimensions and all of the facts. Run a process to spin through each element in each dimension. For each element, create a view and select the total record count for that dimensional element. If there were no records, delete the element. The issue we ran into with this process was locking. Th process ran out of memory (even after increasing the RAM to 12GB). It caused the TM1 server to crash (exceeded the Windows swap file size). The process also took forever. We are confident that both the memory and performance issues were related to locking. Working with IBM product managers and product support, we were never able to get that process to work. We tried many things to get the locking turned down or off without success.

Current process: we create an attribute on each dimension labeled "has data". We initialize all of the values to "no". Each time we load a fact record, we mark all of the leaf level elements used in that record with 'Yes" in this attribute. We then run a process which examines the leaf level elements. When a leaf level element is set to "Yes", the process marks all of the parents of that leaf level element with a "Yes", a process we call "percolating" the values. When all of the values are percolated, we then spin through the dimension and delete all of the elements whose value is still "No." The issue with this solution is that we have to completely rebuild the cube (3+ hours) any time we want to make any dimension changes (because it is the fact loads which determine which elements are used and set the attribute to "Yes").

I would really like to return to a process which does not use the 'has data' attributes but instead just queries the cubes looking for dimensional elements which are not used. But, as discussed, we tried that for a long time without success. There may be some ideas around how to run views to create the list of elements to be deleted or some variations on what we were trying.

I have to believe that there are others who have faced a similar challenge and have developed a solution. Any ideas?

Re: Delete Elements Containing No Data in any cube

Posted: Thu Mar 24, 2011 8:41 pm
by tomok
Just one question: What makes you so sure that removing unused dimension elements is going to make your system that much faster? Have you done any testing, comparing the full-loaded dimensional models against one where you've removed the unnecessary ones? The reason I ask is that TM1 does a really good job of not allocating memory to empty cells in the cubes, they call it their sparse consolidation algorithm. Weeding out these unused elements may not necessarily make your system perform that much better. Of course there is way more to it than that, and I personally wouldn't want the unused elements there either, but it may not be a silver bullet for you.

If it were me I would do my dimension maintenance TI processes against a view where I joined the "fact" table(s) (the one with the data) against the "metadata" table(s). If there is no data in the fact table then it won't appear in the view (due to the magic of the SQL JOIN function). If you have multiple fact tables the SQL may get complicated with multiple JOINS but it should work. if the SQL gets hairy you can create separate views as stepping stones and then join them all together in the end to get the final result. If I were to do a TM1 model where there may be "extra" records in the metadata tables I would join it against the fact table(s) so that those additional records wouldn't become part of the dimension.

Re: Delete Elements Containing No Data in any Cube

Posted: Thu Mar 24, 2011 9:47 pm
by lotsaram
You could introduce the concept of "meta data control" dimensions whcih would essentially be coppies of the actual dimensions but are created from processing zero suppressed (leaf only!) views. You then just re-process the main dimensions and do a Dimix test on its meta data control analouge and just zap the elements not in the meta data control dimension. You might then need to do a round of removing orphan consolidations and consolidations with no ultimate leaf descendants.

A little time consuming and processing intensive but sure to work.

Re: Delete Elements Containing No Data in any cube

Posted: Thu Mar 24, 2011 10:12 pm
by normanbobo
tomok wrote:Just one question: What makes you so sure that removing unused dimension elements is going to make your system that much faster? Have you done any testing, comparing the full-loaded dimensional models against one where you've removed the unnecessary ones? The reason I ask is that TM1 does a really good job of not allocating memory to empty cells in the cubes, they call it their sparse consolidation algorithm. Weeding out these unused elements may not necessarily make your system perform that much better. Of course there is way more to it than that, and I personally wouldn't want the unused elements there either, but it may not be a silver bullet for you.
Although speed is a primary concern, it is not the most important. there is a complicating factor. We are distributing reports / analyses through Cognos BI. We are not sure of the details on how C BI and TM1 interoperate, but have clearly demonstrated to ourselves that a suppress zeros operation occurs largely on the Cognos BI side (or at least is more expensive than a non-compress zeros operation). We did this testing in Analysis Studio, monitoring the TM1 server and the C8 servers with various types of queries. Though TM1 may do a good job of handling sparse data, the query results will frequently still have the elements with no data in the display or will take processing time to remove with a suppress operation.

This also does not address the user concern of expanding the dimension and just seeing the elements there. In the end, the elements have to be deleted.
tomok wrote: If it were me I would do my dimension maintenance TI processes against a view where I joined the "fact" table(s) (the one with the data) against the "metadata" table(s). If there is no data in the fact table then it won't appear in the view (due to the magic of the SQL JOIN function). If you have multiple fact tables the SQL may get complicated with multiple JOINS but it should work. if the SQL gets hairy you can create separate views as stepping stones and then join them all together in the end to get the final result. If I were to do a TM1 model where there may be "extra" records in the metadata tables I would join it against the fact table(s) so that those additional records wouldn't become part of the dimension.
We actually did some testing of SQL queries much like you discussed. It was not a practical solution. It was slow and just felt clumsy. It got worse as we added more and more facts. (And we are still adding queries). Thus we ruled it out.

Re: Delete Elements Containing No Data in any Cube

Posted: Thu Mar 24, 2011 10:21 pm
by normanbobo
lotsaram wrote:You could introduce the concept of "meta data control" dimensions whcih would essentially be coppies of the actual dimensions but are created from processing zero suppressed (leaf only!) views. You then just re-process the main dimensions and do a Dimix test on its meta data control analouge and just zap the elements not in the meta data control dimension. You might then need to do a round of removing orphan consolidations and consolidations with no ultimate leaf descendants.

A little time consuming and processing intensive but sure to work.
Thanks for contributing to yet another post, Lotsaram. Interesting idea on the "meta data control" dimensions.

Our biggest issue with our original design was creating all of the views, which generated tremendous quantities of locks. In the end, this was the core issue of that design ... running the view without generating locks. What's frustrating is there is no way to tell TM1 ... I'm here by myself in batch mode...don't generate locks. There are related commands, but we could not get them to actually work. How might we control that issue under this proposal?

Re: Delete Elements Containing No Data in any Cube

Posted: Thu Mar 24, 2011 10:34 pm
by normanbobo
Lotsaram ... I just found the following post here on the forum which is similar to your idea. But it does not reference the issue of locking.

http://www.tm1forum.com/viewtopic.php?f=3&t=32

Re: Delete Elements Containing No Data in any Cube

Posted: Fri Mar 25, 2011 12:34 pm
by tomok
i still think you are barking up the wrong tree here. We are all TM1 developers and we never want to admit that TM1 can't, or shouldn't do something but what you are trying to do cannot be done in TM1 without a lot of jury-rigging, while for a SQL-based system it is a piece of cake, technology-wise. Any SQL person worth anything could put the necessary queries together to solve your problem pretty quickly. I'm not a certified SQL person and I know I could do it pretty easily. You talked about it being kludgy and slow. Perhaps that is because you aren't filtering the fact table down properly before the JOIN. For example, there is no need to JOIN the full fact tables against the meta data tables because all your are interested is if there is at least one record for that department, product, or whatever in the fact table. So, use DISTINCT against the fact table first, or use GROUP BY to get a summarized view. You don't need to JOIN against every single record in the fact table, just the unique combinations of what are in your dimensions. You should also consider doing the JOIN building routine as an SSIS package, putting the results of the various queries into temporary tables as you work your way through the fact tables. Focus some of that creativity you are trying to apply to TM1 into SQL, you're going to have better success there because it's a much better development environment for what you are tring to accomplish.

Re: Delete Elements Containing No Data in any Cube

Posted: Fri Mar 25, 2011 8:19 pm
by PlanningDev
Can you take the current process you have now and modify it by
  • 1. Adding an Absolute Value Measure to all cubes and creating a rule to add together at N levels the ABS value of all other Measures.
    2. Create a TI to check this measure for all cubes and set your same "Has Data Flag" flag to yes or no.
    3. At the end of setting the flag based on all cubes you would have the same result, all items flagged that did not have values.
    4. Now run the same TI to delete elements based on these values.
    5. Now you can make dimensional changes and produce the result without a full cube reload.

Re: Delete Elements Containing No Data in any Cube

Posted: Sat Mar 26, 2011 2:42 pm
by paulsimon
Hi Norman

Your core issue is that when Cognos BI talks to TM1, Cognos BI does the suppression after extracting from TM1. I believe that IBM have now cured this issue. I would go back to IBM for more details on this. Then you don't need to bother about removing unused elements.

Regards

Paul

Re: Delete Elements Containing No Data in any Cube

Posted: Mon Mar 28, 2011 1:37 pm
by normanbobo
tomok wrote:i still think you are barking up the wrong tree here. We are all TM1 developers and we never want to admit that TM1 can't, or shouldn't do something but what you are trying to do cannot be done in TM1 without a lot of jury-rigging, while for a SQL-based system it is a piece of cake, technology-wise. Any SQL person worth anything could put the necessary queries together to solve your problem pretty quickly. I'm not a certified SQL person and I know I could do it pretty easily. You talked about it being kludgy and slow. Perhaps that is because you aren't filtering the fact table down properly before the JOIN. For example, there is no need to JOIN the full fact tables against the meta data tables because all your are interested is if there is at least one record for that department, product, or whatever in the fact table. So, use DISTINCT against the fact table first, or use GROUP BY to get a summarized view. You don't need to JOIN against every single record in the fact table, just the unique combinations of what are in your dimensions. You should also consider doing the JOIN building routine as an SSIS package, putting the results of the various queries into temporary tables as you work your way through the fact tables. Focus some of that creativity you are trying to apply to TM1 into SQL, you're going to have better success there because it's a much better development environment for what you are tring to accomplish.
I have 15 years of ETL and Data Warehousing experience -- I know this can be done in SQL and/or ETL tools. However, the client does not wish to introduce any more ETL than is absolutely necessary for this system -- they do not want to have to manage an ETL infrastructure for this system -- if it can be avoided. The solution we already have in place, to them, is better than an ETL solution. We would, however, like to improve the TM1 process even more if possible.

Re: Delete Elements Containing No Data in any Cube

Posted: Thu Mar 31, 2011 1:20 pm
by normanbobo
PlanningDev wrote:Can you take the current process you have now and modify it by
  • 1. Adding an Absolute Value Measure to all cubes and creating a rule to add together at N levels the ABS value of all other Measures.
    2. Create a TI to check this measure for all cubes and set your same "Has Data Flag" flag to yes or no.
    3. At the end of setting the flag based on all cubes you would have the same result, all items flagged that did not have values.
    4. Now run the same TI to delete elements based on these values.
    5. Now you can make dimensional changes and produce the result without a full cube reload.
Thanks for the post. The absolute value function is a creative solution to handling "cells with any values" rather than looking for zeros. There is one challenge to this. So far we have been able to avoid having any rules in the cube at all. This is not your normal TM1 cube built for planning -- it will be used purely for reporting. With the size of the cube and the potential volume of users we are avoiding rules with a vengeance and pushing all of the work into batch TI processes. But we could possibly run a process that implements the equivalent of the rule. I'll be giving this some more thought ....

Re: Delete Elements Containing No Data in any Cube

Posted: Thu Mar 31, 2011 1:26 pm
by normanbobo
PaulSimon wrote:Hi Norman

Your core issue is that when Cognos BI talks to TM1, Cognos BI does the suppression after extracting from TM1. I believe that IBM have now cured this issue. I would go back to IBM for more details on this. Then you don't need to bother about removing unused elements.

Regards

Paul
Paul ... I believe you are correct that there are performance improvements in C10 around TM1 data access. That will certainly help. However, query performance is not the only issue. The users do not wish to see the unused elements at all -- not even in the metadata trees in Analysis Studio and other interfaces. They go so far as to consider them "bugs" in the system that they even appear. We are also concerned about the existence of these elements and how to remove them when we start building multidimensional reports in Report Studio .. but we have not gone that far yet.

Re: Delete Elements Containing No Data in any Cube

Posted: Thu Mar 31, 2011 4:06 pm
by blackhawk
Here is another option for you to consider, but this works only if the system doesn't allow users to make data changes (i.e. reporting system rather than a planning system).

I have a customer that has 2million products in a dimension with attributes galore. This dimension is an SKU dimension, which as you can surmise, gets quite cluttered with new / discontinued and non-restocked items. On top of that, since we only load a window of data into TM1 (3 periods at this detail level), we needed to have control over what is really valid and not within TM1.

So, what we did was this:

1. First create a flag, (i.e. InTM1) on the products master table in your data mart.
2. Load the sales data into the data mart tables, as done normally
3. Clear the InTM1 flag in the products master table
4. Perform a query that returns all the products sold within the window where there was activity returning only the product IDs.
5. Update the products master table with a flag of InTM1 = true
6. Clear the InTM1 flag from all the product dimension members in TM1
7. Load the products dimension master into TM1 along with this flag during the TI load
8. After the load is complete, rip through the dimension members and remove all dimension members without this flag set to true.

What you should end up with then are only those products that have activity within the window of time that TM1 holds. New products are taken care of and discontinued or unused products don't show up in the dimension. Best of all, it doesn't require a cube re-load.

Not sure if this applies to your situation, but it is another option to consider.

Re: Delete Elements Containing No Data in any Cube

Posted: Thu Mar 31, 2011 4:22 pm
by normanbobo
PaulSimon wrote:Hi Norman

Your core issue is that when Cognos BI talks to TM1, Cognos BI does the suppression after extracting from TM1. I believe that IBM have now cured this issue. I would go back to IBM for more details on this. Then you don't need to bother about removing unused elements.

Regards

Paul

Paul, thanks for the post. I believe you are correct in stating that IBM has done some work around suppression in C10, but I don't have the evidence to prove it. In any case, C10 will certainly improve our query speeds. However, our users have requested that the unused dimension elements not even appear in the metadata trees, for instance in Analysis Studio. They go so far as to call them "bugs" since they have no data. Of course we know this is not true (and they understand), but they don't want them there. On top of that, we are uncertain as to the issues we will face trying to remove them from reports when we start developing "standard reports" in Report Studio. All in all ... it is necesary that they be completely removed.

Re: Delete Elements Containing No Data in any Cube

Posted: Thu Mar 31, 2011 7:15 pm
by Michel Zijlema
PaulSimon wrote:Hi Norman

Your core issue is that when Cognos BI talks to TM1, Cognos BI does the suppression after extracting from TM1. I believe that IBM have now cured this issue. I would go back to IBM for more details on this. Then you don't need to bother about removing unused elements.

Regards

Paul
What I've found, testing Cognos 8.4 on TM1 9.4, is that there is a huge difference in performance between the inbuilt zero suppression - which, as Paul mentions, is working client side and is bad, especially on stacked row dimensions - and using filters (filtering out zeroes) - which seems to work server side (I guess in the MDX) and where performance is much better. Filtering however needs to be set per dimension, which is less intuitive.

Michel

Re: Delete Elements Containing No Data in any Cube

Posted: Thu Oct 02, 2014 11:02 am
by mithun.mistry11
Hello friends

I have a similar issue...I am using DimensionDeleteAllElements function in a TI to delete current data and recollect data from cubes. But what can I use if I want to say

"If there is anything less than 1, carry on to data and epilog?"

I have used the DimensionDeleteAllElements in prolog in TI.

Thank you