I know that this starts becoming a dark art but would like to rehash some of the old posts - like this one and this one. There are others but the common theme is that there is generally no resolution or process to optimize.
So the issue I am having right now is the following. I have a model with lots of data across 8 cubes (all different sources and uses i.e. no overlap)
Cube MB
14,8 Gb on disk - 834,673,540 populated numeric cells. Memory for input data is 42,823,840,768 (42 Gb). Roughly 51 bytes per cell.
There are 7 dimensions which were created based on the data expected in terms of size and sparsity - dimensions with few elements, used frequently at the top going towards the largest dimension just before the measures dimension. 3rd last dimension has about 2000 leaf and 1100 C: level. 2nd Last dimension has about 25000 N levels and 100 C: levels. Measures dimension has 8 elements, 5 are string - 1 rule derived, the others not actually populated at this stage.
Cube SD
8 Gb on disk - 543,177,497 populated numeric cells. Memory for input data is 13,035,897,856 (13 Gb). Roughly 24 bytes per cell.
Structure is similar to Cube MB but has 2 less dimensions. The two dimensions that are not used contain Source (2 elements) and Type (50 elements). The measures dimension only has 3 elements - NO strings
So I guess the question is really how do I go about reducing the memory usage per cell in Cube MB from 51 bytes to a more reasonable value which in turn will reduce the memory consumption.
-Managed to re-order dimensions manually and achieved a massive 0.0003% reduction! Moving most of the dimensions around resulted in zero change or an increase. System optimization took about an hour and yielded 0.000000%
-Removed string measures and restarted the server - no change.
Where to from here?
Populated Numeric Cells vs Memory for Input Data
-
- MVP
- Posts: 2836
- Joined: Tue Feb 16, 2010 2:39 pm
- OLAP Product: TM1, Palo
- Version: Beginning of time thru 10.2
- Excel Version: 2003-2007-2010-2013
- Location: Atlanta, GA
- Contact:
Re: Populated Numeric Cells vs Memory for Input Data
What is a reasonable value in your opinion, considering that string data is inherently more memory intensive than numerical data? Try removing the string measures from the cube and then check the memory footprint. That should tell you how much having strings is hurting you.gtonkin wrote:So I guess the question is really how do I go about reducing the memory usage per cell in Cube MB from 51 bytes to a more reasonable value which in turn will reduce the memory consumption.
- gtonkin
- MVP
- Posts: 1259
- Joined: Thu May 06, 2010 3:03 pm
- OLAP Product: TM1
- Version: Latest and greatest
- Excel Version: Office 365 64-bit
- Location: JHB, South Africa
- Contact:
Re: Populated Numeric Cells vs Memory for Input Data
Thanks Tom-I did remove the strings which contained no data-no savings on memory.
In terms of what is reasonable-I would guess somewhere around the 30 mark based on other deployments.
In the past, the way to bring down the usage was by re-ordering the dimensions. My next plan is to actually swap the two big ones, restart and see if the reported values in the stats cube agrees to the percentages in the re-ordering window. Got my suspicions at the moment.
In terms of what is reasonable-I would guess somewhere around the 30 mark based on other deployments.
In the past, the way to bring down the usage was by re-ordering the dimensions. My next plan is to actually swap the two big ones, restart and see if the reported values in the stats cube agrees to the percentages in the re-ordering window. Got my suspicions at the moment.
-
- MVP
- Posts: 3702
- Joined: Fri Mar 13, 2009 11:14 am
- OLAP Product: TableManager1
- Version: PA 2.0.x
- Excel Version: Office 365
- Location: Switzerland
Re: Populated Numeric Cells vs Memory for Input Data
From my experience density is more important than size.
Rough as guts method to determine density:
- Create 1D views of the cube with all N elements on row dimension and "total" on all other dimensions, slice out to Excel and count the number of rows
- density on each dimension is then populated N elements / total N elements for the dimension
Order the cube dense to sparse. Smaller dimensions will normally be naturally more dense anyway. See what improvement you get. If there are now no string measures then absolutely no reason to leave the measure dimension as last. This is a limitation of the system suggested reordering which means you can pretty much always beat it manually.
I think IBM's cube sizing benchmark is either 22 or 24 bytes per numeric cell for x64. This is almost always achievable. If you get below 22 bytes/cell then you are doing very very well. > 50 is too much, I'm pretty sure you will be able to save half of this. But 50 isn't terrible. I have audited systems where it was > 150 which was then brought down to under 30.
After doing a reorder the numbers in StatsByCube get funky and can't be trusted. From memory I think total memory used is still reliable but memory for input data stays as it was and memory for feeders shows as negative to balance out (or vice versa), but it's weird. You need to save data and unload the cube and reload the cube ad restart perfmon to get reliable stats for reordered cubes.
Rough as guts method to determine density:
- Create 1D views of the cube with all N elements on row dimension and "total" on all other dimensions, slice out to Excel and count the number of rows
- density on each dimension is then populated N elements / total N elements for the dimension
Order the cube dense to sparse. Smaller dimensions will normally be naturally more dense anyway. See what improvement you get. If there are now no string measures then absolutely no reason to leave the measure dimension as last. This is a limitation of the system suggested reordering which means you can pretty much always beat it manually.
I think IBM's cube sizing benchmark is either 22 or 24 bytes per numeric cell for x64. This is almost always achievable. If you get below 22 bytes/cell then you are doing very very well. > 50 is too much, I'm pretty sure you will be able to save half of this. But 50 isn't terrible. I have audited systems where it was > 150 which was then brought down to under 30.
After doing a reorder the numbers in StatsByCube get funky and can't be trusted. From memory I think total memory used is still reliable but memory for input data stays as it was and memory for feeders shows as negative to balance out (or vice versa), but it's weird. You need to save data and unload the cube and reload the cube ad restart perfmon to get reliable stats for reordered cubes.
Please place all requests for help in a public thread. I will not answer PMs requesting assistance.
-
- Posts: 7
- Joined: Mon Dec 19, 2016 11:56 pm
- OLAP Product: PA TM1
- Version: 2.0
- Excel Version: 2013
Re: Populated Numeric Cells vs Memory for Input Data
Hi
an approach I found quite useful for reducing cube sizes is to convert N-calculations into C-calculations where possible, which let's me save the feeders; for example if I have KPI ['Rev per FTE'] = ['Revenue'] \ ['FTE'] then I can save the feeder by making ['Rev per FTE'] a C-element by assigning ['FTE'] as a child; the rule will then not require a feeder.
not sure if this works for you.
an approach I found quite useful for reducing cube sizes is to convert N-calculations into C-calculations where possible, which let's me save the feeders; for example if I have KPI ['Rev per FTE'] = ['Revenue'] \ ['FTE'] then I can save the feeder by making ['Rev per FTE'] a C-element by assigning ['FTE'] as a child; the rule will then not require a feeder.
not sure if this works for you.
- gtonkin
- MVP
- Posts: 1259
- Joined: Thu May 06, 2010 3:03 pm
- OLAP Product: TM1
- Version: Latest and greatest
- Excel Version: Office 365 64-bit
- Location: JHB, South Africa
- Contact:
Re: Populated Numeric Cells vs Memory for Input Data
Thanks Marcus. The cubes have populated data at this stage-I have removed rules to ensure I focus on the main issue I.e. Data points vs memory for this points.
I do however use the technique you describe in other areas/models.
I do however use the technique you describe in other areas/models.
- gtonkin
- MVP
- Posts: 1259
- Joined: Thu May 06, 2010 3:03 pm
- OLAP Product: TM1
- Version: Latest and greatest
- Excel Version: Office 365 64-bit
- Location: JHB, South Africa
- Contact:
Re: Populated Numeric Cells vs Memory for Input Data
Thanks Lotsa - will go through your suggestion and post back once I have something useful to report.lotsaram wrote:From my experience density is more important than size.
Rough as guts method to determine density:
- Create 1D views of the cube with all N elements on row dimension and "total" on all other dimensions, slice out to Excel and count the number of rows
- density on each dimension is then populated N elements / total N elements for the dimension
Order the cube dense to sparse. Smaller dimensions will normally be naturally more dense anyway. See what improvement you get. If there are now no string measures then absolutely no reason to leave the measure dimension as last. ...
- gtonkin
- MVP
- Posts: 1259
- Joined: Thu May 06, 2010 3:03 pm
- OLAP Product: TM1
- Version: Latest and greatest
- Excel Version: Office 365 64-bit
- Location: JHB, South Africa
- Contact:
Re: Populated Numeric Cells vs Memory for Input Data
Ran the density exercise per Lotsa and have the following:
I re-ordered the dimensions based on the Order column in the table above. Keeping measures last as I will need to include the strings again at some point.
I saved data, restarted the instance and performance manager and checked memory usage again.
Memory used for input data 42,823,840,768 previously 42,823,840,768 - no change!
The only other thing I can think of is reducing the number of dimensions to see how this impacts the cube size. Rationale is that each cell requires an additional pointer for each additional dimension added. Each pointer would consume x no. of bytes in the cube - more dimensions equals more pointers equals more storage required. Will try test this theory further.
Any other suggestions in the mean time are welcome.
Cube MB | All | C Levels | N Levels | Density | Order |
Marketing Budget Source | 2 | 0 | 2 | 100% | 1 |
Scenario | 3 | 0 | 3 | 67% | 4 |
Period Date | 1176 | 40 | 1136 | 66% | 5 |
Spend Type | 76 | 25 | 51 | 73% | 3 |
Marketing Store | 25954 | 631 | 25323 | 76% | 2 |
Marketing Product | 3120 | 1141 | 1979 | 1% | 6 |
Marketing Budget Measures | 3 | 0 | 3 | 100% |
I saved data, restarted the instance and performance manager and checked memory usage again.
Memory used for input data 42,823,840,768 previously 42,823,840,768 - no change!
The only other thing I can think of is reducing the number of dimensions to see how this impacts the cube size. Rationale is that each cell requires an additional pointer for each additional dimension added. Each pointer would consume x no. of bytes in the cube - more dimensions equals more pointers equals more storage required. Will try test this theory further.
Any other suggestions in the mean time are welcome.