Populated Numeric Cells vs Memory for Input Data

Post Reply
User avatar
gtonkin
MVP
Posts: 1259
Joined: Thu May 06, 2010 3:03 pm
OLAP Product: TM1
Version: Latest and greatest
Excel Version: Office 365 64-bit
Location: JHB, South Africa
Contact:

Populated Numeric Cells vs Memory for Input Data

Post by gtonkin »

I know that this starts becoming a dark art but would like to rehash some of the old posts - like this one and this one. There are others but the common theme is that there is generally no resolution or process to optimize.

So the issue I am having right now is the following. I have a model with lots of data across 8 cubes (all different sources and uses i.e. no overlap)
Cube MB
14,8 Gb on disk - 834,673,540 populated numeric cells. Memory for input data is 42,823,840,768 (42 Gb). Roughly 51 bytes per cell.
There are 7 dimensions which were created based on the data expected in terms of size and sparsity - dimensions with few elements, used frequently at the top going towards the largest dimension just before the measures dimension. 3rd last dimension has about 2000 leaf and 1100 C: level. 2nd Last dimension has about 25000 N levels and 100 C: levels. Measures dimension has 8 elements, 5 are string - 1 rule derived, the others not actually populated at this stage.

Cube SD
8 Gb on disk - 543,177,497 populated numeric cells. Memory for input data is 13,035,897,856 (13 Gb). Roughly 24 bytes per cell.
Structure is similar to Cube MB but has 2 less dimensions. The two dimensions that are not used contain Source (2 elements) and Type (50 elements). The measures dimension only has 3 elements - NO strings

So I guess the question is really how do I go about reducing the memory usage per cell in Cube MB from 51 bytes to a more reasonable value which in turn will reduce the memory consumption.

-Managed to re-order dimensions manually and achieved a massive 0.0003% reduction! Moving most of the dimensions around resulted in zero change or an increase. System optimization took about an hour and yielded 0.000000%
-Removed string measures and restarted the server - no change.

Where to from here?
BR, George.

Learn something new: MDX Views
tomok
MVP
Posts: 2836
Joined: Tue Feb 16, 2010 2:39 pm
OLAP Product: TM1, Palo
Version: Beginning of time thru 10.2
Excel Version: 2003-2007-2010-2013
Location: Atlanta, GA
Contact:

Re: Populated Numeric Cells vs Memory for Input Data

Post by tomok »

gtonkin wrote:So I guess the question is really how do I go about reducing the memory usage per cell in Cube MB from 51 bytes to a more reasonable value which in turn will reduce the memory consumption.
What is a reasonable value in your opinion, considering that string data is inherently more memory intensive than numerical data? Try removing the string measures from the cube and then check the memory footprint. That should tell you how much having strings is hurting you.
Tom O'Kelley - Manager Finance Systems
American Tower
http://www.onlinecourtreservations.com/
User avatar
gtonkin
MVP
Posts: 1259
Joined: Thu May 06, 2010 3:03 pm
OLAP Product: TM1
Version: Latest and greatest
Excel Version: Office 365 64-bit
Location: JHB, South Africa
Contact:

Re: Populated Numeric Cells vs Memory for Input Data

Post by gtonkin »

Thanks Tom-I did remove the strings which contained no data-no savings on memory.
In terms of what is reasonable-I would guess somewhere around the 30 mark based on other deployments.
In the past, the way to bring down the usage was by re-ordering the dimensions. My next plan is to actually swap the two big ones, restart and see if the reported values in the stats cube agrees to the percentages in the re-ordering window. Got my suspicions at the moment.
BR, George.

Learn something new: MDX Views
lotsaram
MVP
Posts: 3702
Joined: Fri Mar 13, 2009 11:14 am
OLAP Product: TableManager1
Version: PA 2.0.x
Excel Version: Office 365
Location: Switzerland

Re: Populated Numeric Cells vs Memory for Input Data

Post by lotsaram »

From my experience density is more important than size.
Rough as guts method to determine density:
- Create 1D views of the cube with all N elements on row dimension and "total" on all other dimensions, slice out to Excel and count the number of rows
- density on each dimension is then populated N elements / total N elements for the dimension

Order the cube dense to sparse. Smaller dimensions will normally be naturally more dense anyway. See what improvement you get. If there are now no string measures then absolutely no reason to leave the measure dimension as last. This is a limitation of the system suggested reordering which means you can pretty much always beat it manually.

I think IBM's cube sizing benchmark is either 22 or 24 bytes per numeric cell for x64. This is almost always achievable. If you get below 22 bytes/cell then you are doing very very well. > 50 is too much, I'm pretty sure you will be able to save half of this. But 50 isn't terrible. I have audited systems where it was > 150 which was then brought down to under 30.

After doing a reorder the numbers in StatsByCube get funky and can't be trusted. From memory I think total memory used is still reliable but memory for input data stays as it was and memory for feeders shows as negative to balance out (or vice versa), but it's weird. You need to save data and unload the cube and reload the cube ad restart perfmon to get reliable stats for reordered cubes.
Please place all requests for help in a public thread. I will not answer PMs requesting assistance.
Marcus Dietrich IBM
Posts: 7
Joined: Mon Dec 19, 2016 11:56 pm
OLAP Product: PA TM1
Version: 2.0
Excel Version: 2013

Re: Populated Numeric Cells vs Memory for Input Data

Post by Marcus Dietrich IBM »

Hi
an approach I found quite useful for reducing cube sizes is to convert N-calculations into C-calculations where possible, which let's me save the feeders; for example if I have KPI ['Rev per FTE'] = ['Revenue'] \ ['FTE'] then I can save the feeder by making ['Rev per FTE'] a C-element by assigning ['FTE'] as a child; the rule will then not require a feeder.

not sure if this works for you.
User avatar
gtonkin
MVP
Posts: 1259
Joined: Thu May 06, 2010 3:03 pm
OLAP Product: TM1
Version: Latest and greatest
Excel Version: Office 365 64-bit
Location: JHB, South Africa
Contact:

Re: Populated Numeric Cells vs Memory for Input Data

Post by gtonkin »

Thanks Marcus. The cubes have populated data at this stage-I have removed rules to ensure I focus on the main issue I.e. Data points vs memory for this points.

I do however use the technique you describe in other areas/models.
BR, George.

Learn something new: MDX Views
User avatar
gtonkin
MVP
Posts: 1259
Joined: Thu May 06, 2010 3:03 pm
OLAP Product: TM1
Version: Latest and greatest
Excel Version: Office 365 64-bit
Location: JHB, South Africa
Contact:

Re: Populated Numeric Cells vs Memory for Input Data

Post by gtonkin »

lotsaram wrote:From my experience density is more important than size.
Rough as guts method to determine density:
- Create 1D views of the cube with all N elements on row dimension and "total" on all other dimensions, slice out to Excel and count the number of rows
- density on each dimension is then populated N elements / total N elements for the dimension

Order the cube dense to sparse. Smaller dimensions will normally be naturally more dense anyway. See what improvement you get. If there are now no string measures then absolutely no reason to leave the measure dimension as last. ...
Thanks Lotsa - will go through your suggestion and post back once I have something useful to report.
BR, George.

Learn something new: MDX Views
User avatar
gtonkin
MVP
Posts: 1259
Joined: Thu May 06, 2010 3:03 pm
OLAP Product: TM1
Version: Latest and greatest
Excel Version: Office 365 64-bit
Location: JHB, South Africa
Contact:

Re: Populated Numeric Cells vs Memory for Input Data

Post by gtonkin »

Ran the density exercise per Lotsa and have the following:
Cube MBAllC LevelsN LevelsDensityOrder
Marketing Budget Source202100%1
Scenario30367%4
Period Date117640113666%5
Spend Type76255173%3
Marketing Store259546312532376%2
Marketing Product3120114119791%6
Marketing Budget Measures303100%
I re-ordered the dimensions based on the Order column in the table above. Keeping measures last as I will need to include the strings again at some point.
I saved data, restarted the instance and performance manager and checked memory usage again.
Memory used for input data 42,823,840,768 previously 42,823,840,768 - no change!

The only other thing I can think of is reducing the number of dimensions to see how this impacts the cube size. Rationale is that each cell requires an additional pointer for each additional dimension added. Each pointer would consume x no. of bytes in the cube - more dimensions equals more pointers equals more storage required. Will try test this theory further.

Any other suggestions in the mean time are welcome.
BR, George.

Learn something new: MDX Views
Post Reply