Expand is SLOOOOOOOOOOW

declanr · Post by **declanr** » Mon Sep 13, 2021 10:05 am

Recently stuck a blog post up over on the IBM Community https://community.ibm.com/community/use ... ing-expand but long story short is that if you are using Expand to concatenate your variables up into a big string (for exporting to flat file or doing an ODBCInsert etc.) it is slower than doing the concatenation (and conversions from number to string) manually.

Or at least those were my findings.
I was working with a very large dataset and calling the Expand function many times, the time saving I got by swapping it out was very significant.
If you are only running a small number of records or doing something on the prolog/epilog it is unlikely you will notice a big difference but for large data sets, you might manage to shave some minutes off longer running processes by getting rid of the Expand function.

It's a super simple example but you can see the differences in timing when running something like below (unhash 1 at a time and see the differences):

Code: Select all

cOutputFile = 'C:\test.txt'; 
sString = 'String'; 
nNumeric = 1; 

iCount = 1; 
iMax = 10000000; 
While ( iCount <= iMax ); 
   #TextOutput ( cOutputFile, Expand ( '%sString%, %nNumeric%' ) ); 
   #TextOutput ( cOutputFile, sString, NumberToString ( nNumeric ) ); 
   #TextOutput ( cOutputFile, Expand ( '%sString%' ) ); 
   #TextOutput ( cOutputFile, sString ); 
   #TextOutput ( cOutputFile, Expand ( '%nNumeric%' ) ); 
   #TextOutput ( cOutputFile, NumberToString ( nNumeric ) ); 
   #TextOutput ( cOutputFile, Expand ( '%nNumeric%, %nNumeric%, %nNumeric%' ) ); 
   #TextOutput ( cOutputFile, NumberToString ( nNumeric ), NumberToString ( nNumeric ), NumberToString ( nNumeric ) ); 
   iCount = iCount + 1; 
End;

For me doing the export for 3 numeric variables took 105 seconds with expand vs 20 seconds without it.
Worth noting that in my actual process I was really using it for I had numerics that were not integers and I was just exporting to flat file to throw it all in a DB for further processing. I didn't need to worry about number formatting so NumberToString and/or Expand was sufficient.

I believe the biggest hit is the method used to convert numbers to strings in the Expand function is less efficient than that of NumberToString.
If you are only putting string variables in anyway there is a slight speed hit but not too significant... this make sense when you are essentially throwing an extra function into the mix that is arguably unnecessary.

It seems sort of obvious in hindsight that it would be slower but it's something I hadn't seen mentioned before or thought about until I needed to.
So thought I would share in case it could save someone else a little bit of a headache in the future.

PavoGa · Post by **PavoGa** » Mon Sep 13, 2021 3:00 pm

Very interesting!

I use EXPAND a lot, but do not have to deal with excessively large datasets. A few years ago I tested and compared using EXPAND vs IF conditionals for output that varied in the number of fields, and while EXPAND was slower, the difference was negligible (at the time) vs the gain in cleaner and more flexible code.

I have one or two things I may need to look at however. Thanks, Declan.

lotsaram · Post by **lotsaram** » Mon Sep 13, 2021 3:20 pm

This is pretty interesting. But also not that unsurprising that Expand isn't that fast. Unless there is big data volume and a significant incentive to optimize for performance I think I would usually stick with Expand as it makes for cleaner code and it's just so damm convienient for use cases like global variables and status messages.

declanr · Post by **declanr** » Mon Sep 13, 2021 3:56 pm

Yeah, my use case is a pretty extreme one.
I've always gone with Expand before (or at least since I knew how to use it) and have never needed to even consider it before but the process in question is a beast. So I ended up checking all sorts of things - like whether TextOutput or AsciiOutput would be faster (didn't make a noticeable difference if anyone is interested.)

Did pick up a couple of others that made quite extreme differences which I will also share when time permits and after sense checking them to make sure there weren't other things muddying the water.

For this one like you say Lotsa I will usually stick with Expand but if I have any more processes like this where the volumes are so large that it saves time in the minutes rather than seconds (in this case it was many minutes) I feel I would have to forego the clean aspect in favour of speed.
That being said, I have gone many years without needing to push this much data multiple times a day so it will probably be a while before I come across another project like this... or at least I hope so for my sanity.

ascheevel · Post by **ascheevel** » Mon Sep 13, 2021 4:23 pm

Great insight and thanks for sharing. Like others have posted, I'll continue to use EXPAND because it's so useful but will definitely be more aware of this if troubleshooting performance issues. It seems that the numeric conversion within EXPAND is the primary slowdown driver although your testing still shows material difference when working with strings only. Personally as a rule, I never pass numeric values to EXPAND. I either convert to string with NumberToString first or when the source is a cubeview and I'm simply textoutputting the cube value, I define all variables as string and let the TI do the conversion automatically before feeding the variable to the EXPAND function.

Below is a modification of your test where I added an sNumeric variable where nNumeric is converted to string with NumberToString and that string variable is then passed to the EXPAND function. When I then conduct your "1 String, 1 Numeric" test with that method, I get similar performance difference as your "1 string" test where concatenation is still about 15% faster than EXPAND.

Code: Select all

cOutputFile = 'C:\test.txt'; 
sString = 'String'; 
nNumeric = 1; 

iCount = 1; 
iMax = 10000000; 
While ( iCount <= iMax ); 
   # test 1
      #sNumeric = NumberToString(nNumeric)
      #TextOutput ( cOutputFile, Expand ( '%sString%, %sNumeric%' ) ); 
   
   # test 2
      #TextOutput ( cOutputFile, sString, NumberToString ( nNumeric ) ); 
   
   iCount = iCount + 1; 
End;

TM1 Forum

Expand is SLOOOOOOOOOOW

Expand is SLOOOOOOOOOOW

Re: Expand is SLOOOOOOOOOOW

Re: Expand is SLOOOOOOOOOOW

Re: Expand is SLOOOOOOOOOOW

Re: Expand is SLOOOOOOOOOOW