Large Dimension Build Cause Large Memory Usage
Posted: Tue Sep 29, 2015 7:05 pm
Hello,
I'm currently in the process of creating a pretty large dimension (approx 1.9 million elements). I'm sourcing the dimension from an Oracle database (using an Oracle 11g 32 bit client). The behavoir occuring is a larger than normal (in my opinion) swell in memory as the recordset iterated over (it's approx 1.3 records). Specifically, prior to running the process the instance is around 1GB in memory however after running the process the instance has increased to almost 100 GB. While watching the task manager when the process is in the metadata tab the memory is increasing at a rate of almost 200 MB - 300 MB/sec in size. To help isolate what could be a potential problem I've done the following:
- Removed the dimension from any existing cube
- As a starting point, I've completely cleared the dimension of all elements (except one)
- In my process I'm not establishing any of the parent/child relationships. In other words, the process is only adding the elements (the 1.9 to the dimension) to the dimension
- I created a flat file with the same result set that is returned from my query. When I run the process with the flat file the increase in memory is from 1GB to around maybe 2 GB (i.e. seems reasonable)
- I then modified the process to do nothing in TM1 except build a text file representing the dataset being pulled from the database. Just spinning through the 1.3 million records led to an 80 GB increase in memory
Once, I stop and restart the service the instance reverts to original size. With that being said, I started the performance monitor expecting the server stats to have a whole bunch of memory placed into Garabage However that wasn't the case, despite having a consumption of the tm1sde.exe of nearly 100GB my StatsForServer showed only 4 or 5 GB consumed.
The one additional caveat to this is both the database and Tm1 database reside on amazon web services, and I don't have a means of determining where their respective data centers reside. I would think that sort of thing could impact performance from an execution time persepctive but I would think it would not have such a large impact on memory.
Any thoughts?
Thanks,
Brad
I'm currently in the process of creating a pretty large dimension (approx 1.9 million elements). I'm sourcing the dimension from an Oracle database (using an Oracle 11g 32 bit client). The behavoir occuring is a larger than normal (in my opinion) swell in memory as the recordset iterated over (it's approx 1.3 records). Specifically, prior to running the process the instance is around 1GB in memory however after running the process the instance has increased to almost 100 GB. While watching the task manager when the process is in the metadata tab the memory is increasing at a rate of almost 200 MB - 300 MB/sec in size. To help isolate what could be a potential problem I've done the following:
- Removed the dimension from any existing cube
- As a starting point, I've completely cleared the dimension of all elements (except one)
- In my process I'm not establishing any of the parent/child relationships. In other words, the process is only adding the elements (the 1.9 to the dimension) to the dimension
- I created a flat file with the same result set that is returned from my query. When I run the process with the flat file the increase in memory is from 1GB to around maybe 2 GB (i.e. seems reasonable)
- I then modified the process to do nothing in TM1 except build a text file representing the dataset being pulled from the database. Just spinning through the 1.3 million records led to an 80 GB increase in memory
Once, I stop and restart the service the instance reverts to original size. With that being said, I started the performance monitor expecting the server stats to have a whole bunch of memory placed into Garabage However that wasn't the case, despite having a consumption of the tm1sde.exe of nearly 100GB my StatsForServer showed only 4 or 5 GB consumed.
The one additional caveat to this is both the database and Tm1 database reside on amazon web services, and I don't have a means of determining where their respective data centers reside. I would think that sort of thing could impact performance from an execution time persepctive but I would think it would not have such a large impact on memory.
Any thoughts?
Thanks,
Brad