Page 1 of 1
Cube size shows no influence of dimension elements deleted
Posted: Mon Nov 12, 2012 4:25 am
by Olivier
Dear all,
I am confused today as I am playing with a cube which has a very large dimension. ( Approx 500 000 elements ).
The cube in question is a staging cube that contain close to transactional information.
I understand this is a questionable design but it is not the point of my concern...
After processing several month of data in that cube, and it has reached a size close to 260 megabytes on the server hard drive.
The large dimension contains roughly 500k elements with data associated at that point of time.
The case :
I am then deleting all elements from the large dimension except Only 12 elements.
The update on the dimension is made using an XDI file upload.
No data are attached to the 12 elements remaining in the dimension in that cube.
( only a couple of attribute are populated for these but obviously they are not stored in the cube but rather in the dimension attribute cube...).
To my surprise,
on the hard drive the size of the cube does not change and remain 260 megabytes once the 500k dimension elements are deleted (except 12 with no data) even if :
- I restart the server
- I execute a save data all.
On the other end, as excpected,
the memory used by the server after the restart reflects clearly the reduction of size as it went from approximately 4.5 gigabytes used to 1.5 after deletion + restart.
The Questions :
Shouldn't the cube size on the server hard drive reflect to a certain extend the actual data volume stored in the cube ?
Could somebody please help me understand the rational behind this ?
Re: Cube size shows no influence of dimension elements delet
Posted: Mon Nov 12, 2012 6:14 am
by Olivier
Interestingly the cube data size on the drive finally dropped after I reprocessed data into the cube, generate new elements into the dimension and did another save data.
I am not sure why in the sequence the size of the cube would not drop as soon as elements are deleted and the data saved or the server restarted.
Re: Cube size shows no influence of dimension elements delet
Posted: Mon Nov 12, 2012 8:21 am
by Alan Kirk
Olivier wrote:Interestingly the cube data size on the drive finally dropped after I reprocessed data into the cube, generate new elements into the dimension and did another save data.
I am not sure why in the sequence the size of the cube would not drop as soon as elements are deleted and the data saved or the server restarted.
I can make an educated guess; it's a
data (not metadata) change that "flags" to TM1 whether it needs to re-save a particular cube. If there has been no change flagged since the last data save, then the cube will not be saved even if you do a Save All. This is a time saving feature, since it would be pointless for TM1 to re-save each and every cube regardless of whether it had changed. After all, the save process is often the primary performance bottleneck. You don't actually lose much by retaining the data in the .cub file; all that will happen upon startup is that the data that no longer has a valid element will fail to load.
The deletion of data via the deletion of elements is not (flagged as) an actual data change. (Were it otherwise, after each metadata change the server would need to somehow go through and see whether any populated cells had been lost, which would be a ridiculously time consuming task.) Consequently the cube would not be flagged for a save after you made your change, and thus it retained the same size. My bet is that had you checked the time and date of the .cub file it would have been from the save prior to you deleting the elements.
Re: Cube size shows no influence of dimension elements delet
Posted: Mon Nov 12, 2012 10:25 pm
by Olivier
Thanks for taking some time to comment Alan,
I was just curious to understand a bit better the behaviour...
My bet is that had you checked the time and date of the .cub file it would have been from the save prior to you deleting the elements.
Sounds a safer bet then "Americain" at the Melbourne cup
I do not recall the time stamp on the .cub file... but the save data post elementS deletion was much quicker then I expected...
so I think your guess is very accurate...
Re: Cube size shows no influence of dimension elements delet
Posted: Wed Nov 14, 2012 1:02 pm
by Harvey
Yes, I have noticed this too. It seems TM1 doesn't drop data when a dimension changes, nor does it flag that the data has changed so the cube is included in a Save Data All.
It's pretty stupid in this case, but TM1 has a bias toward speed optimization, as opposed to memory or disk consumption efficiency, so I guess that's the reason for the behaviour.
Re: Cube size shows no influence of dimension elements delet
Posted: Thu Nov 15, 2012 3:55 am
by Olivier
It's pretty stupid in this case, but TM1 has a bias toward speed optimization, as opposed to memory or disk consumption efficiency
I am wondering if in this instance it is actually incurring a side effect on performance for calculations.
Assuming we understand the way the data are flagged as not having been changed as Alan guessed hence explaining the save data timing and cube size on the drive and as a result in memory.
Would that be fair to assume then that when the cube is up and running in memory,
having lost all these elements,
but not having flagged the data associated as changed,
when a view is called and calculated the consolidations calculation and or rules will face some sort of overheads ?
(due to the virtual state of old invalid data that do not have elements to sit against ?)
I will try to test against that when I have a chance.
I found that the selective mass deletion of elements is quite slow. I use an attribute to identify the set of element that have to be deleted.
To avoid impacting business, this housekeeping process is fired on friday nights when condition to trigger the chore are met ( periodic)
and my hope was to have a fresh lighter system ready to go on monday mornings.
The process executed to do this clean up is quite straight forward but timings is about 5000 seconds for 500 000 elements. ( approximate). Hence the friday scheduling.
I think one of the implication of this is that actually my system is "really" fresh and light only when the next load + save data all will have been done in that particular cube.
Again this is based on the assumption that calculation ( or cube) performance is impacted by the data not getting cleared straight after element deletion.
Note :
It is a bad design coming back to bite me...
we should have build a relational database to capture these transactional data...but you don't always do what you could/should/want...
Hopefully if performance becomes an issue for users, I can show the path to a better practise...
Re: Cube size shows no influence of dimension elements delet
Posted: Thu Nov 15, 2012 4:30 am
by Harvey
I would say it's unlikely to cause a performance hit.
Internally, when a consolidated element is selected in a view, TM1 finds all leaf elements under the consolidated element to determine which values to sum. Additional data being on disk, or even in memory, wouldn't affect this process, regardless of feeder flags or cached calc values.
I'm not convinced this additional data exists in memory, just because it has not been re-saved to disk yet. Remember that the RAM TM1 reports to the OS as being "in use" is just the RAM it has reserved for use. TM1 doesn't tend to give back RAM that it has allocated (again, a bias toward speed and not efficiency), so you're unlikely to see a drop in reserved RAM, even if it is no longer used.
However, it's worth testing if you can find a way to do so, and please share your finding here, as I'm sure it would be very interesting to other members.
Re: Cube size shows no influence of dimension elements delet
Posted: Thu Nov 15, 2012 7:36 am
by Alan Kirk
Lazarus wrote:I would say it's unlikely to cause a performance hit.
Internally, when a consolidated element is selected in a view, TM1 finds all leaf elements under the consolidated element to determine which values to sum. Additional data being on disk, or even in memory, wouldn't affect this process, regardless of feeder flags or cached calc values.
I'm inclined to agree with both your reasoning and conclusions. However...
Lazarus wrote:I'm not convinced this additional data exists in memory, just because it has not been re-saved to disk yet.
I believe it does, at least in the session during which the deletion was done. (Not afterwards, though, I agree.)
I just obliterated all but 1 element in a dimension on a decent-sized (100 meg) cube, carefully watching the performance monitor stats both before and after for both the server and the cube. The stats were exactly the same, even after a data save (which, as I'd guessed, didn't update the .cub date and time). Not a byte was reduced from the cube, not a byte added to garbage. To get the reduction in the memory usage I had to re-start the server. During the load on restart none of the values relating to the missing elements could be loaded (as I'd mentioned earlier) and thus it had a much smaller memory footprint.
The one test that I didn't do (since I couldn't do both in the one set) was to make a data change in the cube after the deletion was done to see whether
that sent all of the surplus memory to garbage.
Re: Cube size shows no influence of dimension elements delet
Posted: Thu Nov 15, 2012 10:38 am
by Harvey
Youch, interesting result.
I wonder why saving a dimension after element deletion/s tends to be slower on bigger, more complex cubes. I always assumed TM1 was spending that time freeing up memory.
Thanks for spending the time investigating it.
Re: Cube size shows no influence of dimension elements delet
Posted: Thu Nov 15, 2012 11:43 am
by Duncan P
Every time you change a dimension then the rules of all cubes in which that dimension is used need to be recompiled. This recompilation can have a number of consequences, including cache invalidation, remapping of dependencies and regeneration of feeders. The larger and more complex those cubes are the more time that can take.