Hi,
We have two different customers that are exhibiting broadly similair issues.
The total RAM in the box doesn't add back to that showing in the task manager. i.e. The RAM on the VM is 72GB, the PA services total 35 GB but the VM is running at 95% RAM consumption, with ~30GB "missing".
Adding more RAM to the box just means that the missing component increases.
Whenever a TI runs the processor consumption goes straight to 100%, this doesn't seem possible in a single threaded situation. 16 cores assigned to the VM.
We see the missing RAM component even when the PA services are not running after a restart with the PA services set to manual
Both customers insist that the VM is configured to have the resourced fully reserved for the VM by the host.
My money was on the VM Host using the resources dynamically and taking processor and RAM back from the PA VM but as mentioned above I've been told this is not the case.
Anyone have any ideas what is going on here and anything else we can ask the customers about the configuration of the VM?
I think you are correct and it's a VM sharing resource issue...as I did come across something like this at customer's site once, where the VM was 'sharing' the RAM with another application. We noticed that TM1 was sluggish at times and this was down to the 'other' application writing to their SQL back end at the time. IT denied it initially, but later admitted they set this up without telling anyone.
We asked IT to either move the other application to another VM box or remove this function and things went back to normal.
Might not be the case here, but it sounds very familiar.
It could be "Driver locked memory" if you are using Hyper-V. VMWare had a similar "balloon driver" at some point and may still. Sysinternal RAMMap would help pin it down, but I have seen VM servers at 95% of RAM utilization, with about 40% due to a "Driver Locked" status in RAMMap. It didn't make me warm and fuzzy. Dynamic Memory was the culprit.
What is driver locked memory, and is this a problem?
Driver locked memory is when a kernel-mode driver prevents memory pages from being swapped to the page file. It is through this mechanism that Hyper-V varies the amount of available memory to a guest when Dynamic Memory is enabled. In the case above, the Hyper-V Manager may show the guest only using about 50% of its maximum allocated memory with the remaining 50% being "locked" by the Hyper-V integration services drivers. VMware uses the same process through its balloon driver to reclaim guest memory.
For most applications, this locked memory is not going to cause a problem as Hyper-V will release memory as the amount of available memory lowers (the buffer threshold can be set on the guest properties). However, some applications, like Microsoft SQL Server, will try to manage their own memory usage based on the available memory. In that case, Dynamic Memory could be a problem.
To disable Dynamic Memory and release the driver locked memory in Hyper-V, you will need to shutdown the guest and uncheck Dynamic Memory from the guest memory properties. Then set the Start-Up memory to the previous maximum memory value.
So as best we can tell this issue is being caused by Planning Analytics Admin Agent (PAAA) spawning and dropping many instructions to powershell to look at the registry. The many many instances of these hanging around in the background are chewing through all the RAM over time.
I raised a case (TS012607330) recently re the Planning Analytics Admin agent and high cpu usage on some of our environments (but not all environments).
They advised us after providing all relevant logs and other information, to check "Power setting in OS , BIOS and Hypervisor (if this is VM) as this will impact CPU utilization even on small real CPU tasks".
Didn't really get anywhere with it, so will be interested to know if you can squeeze anything useful out of them.
We also have an old ticket, since closed, wrt to PAAA and high processor usage.
This only occured when PAW was making a request to PAAA for the performance stats and so some what mitigated at the customer site by turning down the update frequency.
The processor spike seemed to be driven by the PAAA activity triggering some virus scanning activity (depsite correct exclusions being set-up) and powershell activity.
This was evenutally tracked down to an unpatched log4j issue, when PAAA spins up it was using log4j which is triggering the alert.
Assuming you are on a current release of PAAA this shouldn't be an issue anymore...