Page 1 of 1

Efficient rule

Posted: Fri Jun 20, 2014 11:32 am
by deepakjain2020
Hi All,

Cube is having 7 dimensions.

Element1 & Element2 are from Mdim dimension.
dim3 having elements A1,B1,C1,C2,A2,D2...

Element2 will be calcuated for elements starting with A & B in dim3.
Can someone please help in understanding which statement is more efficient?

Statement1

['Element2'] = N: IF(subst(!dim3,1,1) @='A' % subst(!dim3,1,1) @='B' , ['Element1'], stet);

Statement2
For this will be creating a attribute which will have 'Rep' value against elements starting with A & B.

['Element2']=DB('Cube',!dim1,!dim2,ATTRS('dim3',!dim3,'Rep'),!dim4,!dim5,!dim6,'Element1');


Regards,
Deepak Jain

Re: Efficient rule

Posted: Fri Jun 20, 2014 11:48 am
by declanr
I would go with the attribute based rule but that is without even bothering to think about efficiency. I say this because although checking the substring of the first letter is A or B may be good and well now but you need to allow for the possibility that a future development may cause you to need to put an extra element in that dimension called "APPLES" or "BANANAS"... at least with the attribute as long as it's only used for that rule, you can guarantee that the functionality should always work without interfering with any possible future changes.

Re: Efficient rule

Posted: Fri Jun 20, 2014 11:55 am
by Harvey
When any issue of efficiency comes up, it's always important to do some testing to determine the best solution. TM1 has some complicated internal logic that leads to quirky behaviour in some cases that can be difficult to predict.

Generally speaking, I simply count the number of cube reads to determine the efficiency of a rule that has fairly straightforward logic -- but if the logic gets complicated, that can certainly impact efficiency too.

In your case, you've got more cube reads in the second version of the rule, as you're reading the attribute in every case, and also attempting to read from invalid intersections (assuming the attribute is blank for for elements in dim3 that don't start with 'A' or 'B'.

However, the first version has significantly increased logic, with it's IF statement and the call to SUBST, so it might be line-ball.

My guess would be a hybrid of the two approaches would trump both in efficiency. I would use a Flag attribute, as follows:

['Element2'] = N: IF(ATTRN('dim3',!dim3,'Flag') = 1, ['Element1'], STET);

That way, you're only reading a numeric attribute, which is much faster than reading a string, and you've eliminated the call to SUBST.

Still, best to do some real-world tests to check what's best in your specific environment, with your particular data.

Re: Efficient rule

Posted: Sun Jun 22, 2014 2:27 pm
by Duncan P
I would be interested to see the numbers on the performance test that indicated that returning a numeric attribute is quicker than a text attribute. As all attribute values are held as strings in the underlying attribute cube (which is how you can have attributes at consolidated level) it has to do extra conversion work for the numeric attributes.

Of the two original approaches, and setting aside Declan's very valid concerns about maintainability, I would expect the second approach to be quite a bit faster - for the reasons stated by Harvey about cube reads. However I am not in a position to test this for myself.

Re: Efficient rule

Posted: Sun Jun 22, 2014 4:09 pm
by Harvey
Interesting points Duncan. I'll see if I can get inspired to run a few tests tomorrow and confirm some of these possibilities.

I'd be interested if you can provide a link to the documentation that confirms numeric attributes are stores as string.

Are you just extrapolating from the behaviour at consolidated intersections? If so, it's also worth noting that elements in the measure dimension of the attribute cube show as numeric -- even the string ones -- so perhaps it's a special type of element that one can't create manually?

Re: Efficient rule

Posted: Sun Jun 22, 2014 4:35 pm
by declanr
Harvey,

If you run a DTYPE function on attributes instead of "N" or "S" you get "AN", "AS" or "AA"... also you will notice some fun extra steps when trying to put a rule in your element attributes cube against numeric attributes (or at least you used to.)

So i've always understood it to be that they are sort of their own type of elements but the treatment is essentially as if they were string.

Re: Efficient rule

Posted: Sun Jun 22, 2014 7:16 pm
by Duncan P
There is no official documentation there is significant evidence, as explained in this post here.

Re: Efficient rule

Posted: Wed Jun 25, 2014 9:42 am
by Harvey
You're right Duncan. I did a test and determined that there is no significant difference between the performance of string and numeric attributes. This suggests that numeric attributes are simply regular text attributes with a data input validation and automatically-handled conversion.

I would still recommend numeric attributes in most cases, as the performance hit is negligible and there may come a day when IBM take advantage of the efficiency of storing numeric values over strings. Although I've heard pig might fly too!

If performance is utterly paramount, the results of the test might convince me to use hard-coded logic, but only in cases when I could be utterly sure the structures and logic were certain to never change!

I detailed my test results in an article on the Flow Blog, along with the model and full results. If you guys have a chance to check it out, let me know if my thoughts and methodologies are sound.

Re: Efficient rule

Posted: Wed Jun 25, 2014 11:28 pm
by EvgenyT
Interesting findings, thanks Harvey.