Freesteel Blog » The new Desig function

The new Desig function

Tuesday, March 16th, 2010 at 8:16 pm Written by:

I have a minimal library of trivial functions in my C++ code. Things like:

inline double Square(double x)
{  return x*x; }

inline double Len(const P3& a)
{  return sqrt(Square(a.x) + Square(a.y) + Square(a.z); }

inline P3 ConvertGZ(const P2& a, double z)
{  return P3(a.u, a.v, z); }

inline P2 operator*(const P2& a, double lam)  
{  return P2(a.u * lam, a.v * lam); }
   // note scalar multiplication on right. 

inline int Signum(double x)
{  return (x < 0.0 ? -1 : (x > 0.0 ? 1 : 0)); }

After years of it being pretty stable, I have discovered a new function that is worth adding:

inline Desig(double x, bool bpostive)
{  return (bpositive ? x : -x);  }

Here is how it looks in some real code:

double zlo = tpf.z + tz.z * lzoclo 
             - Desig(sqrt(ezsq), bupdir) * rad; 

Previously, this would have been:

double zlo = tpf.z + tz.z * lzoclo 
             - (bupdir ? sqrt(ezsq) : -sqrt(ezsq)) * rad; 

Or even:

double zlo = tpf.z + tz.z * lzoclo 
             - sqrt(ezsq) * rad * (bupdir ? 1 : -1); 

But that’s relying on the compiler being clever enough to avoid applying that extra multiplication, and knowing how to recognize it as a simple sign invert.

These little inline functions get around limitations in the expressiveness of the C language, where there is a negator operator, but no negator type with which I could have written:

double zlo = tpf.z + tz.z * lzoclo 
             - (bupdir ? + : -)sqrt(ezsq) * rad; 

Using this Desig() function, the compiler now knows exactly what I want: a sign change:

double z1 = tpf.z + tz.z * lzohi + t * rad * Desig(ez, bupdir); 
10227915  test        bl,bl 
10227917  fld         qword ptr [esi+48h] 
1022791A  jne         (1022791E) 
1022791C  fchs             
1022791E  mov         edx,dword ptr [esi+24h] 
10227921  fmul        st,st(4) 
10227923  fmul        st,st(5) 
10227925  fxch        st(2) 
10227927  fmul        qword ptr [ecx+10h] 
1022792A  faddp       st(2),st 
1022792C  fxch        st(1) 
1022792E  fadd        qword ptr [edx+10h] 

It looks convincing: a one line jump over an fchs (change sign) function, as imported from the 8087 floating point coprocessor instruction set.

Hm. I never thought of looking at disassembly code before. Let’s check the alternatives.

double z1 = tpf.z + tz.z * lzohi + t * rad * (bupdir ? ez : -ez);
10227909  fld         qword ptr [esi+48h] 
1022790C  mov         bl,1 
1022790E  jmp         (10227917) 
10227910  fld         qword ptr [esi+48h] 
10227913  xor         bl,bl 
10227915  fchs             
10227917  mov         edx,dword ptr [esi+24h] 
1022791A  fmul        st,st(4) 
1022791C  fmul        st,st(5) 
1022791E  fxch        st(2) 
10227920  fmul        qword ptr [ecx+10h] 
10227923  faddp       st(2),st 
10227925  fxch        st(1) 
10227927  fadd        qword ptr [edx+10h] 

And:

double z1 = tpf.z + tz.z * lzohi + t * rad * ez * (bupdir ? 1 : -1); 
1022790F  xor         eax,eax 
10227911  test        dl,dl 
10227913  setne       al   
10227916  mov         edi,dword ptr [esi+24h] 
10227919  lea         eax,[eax+eax-1] 
1022791D  mov         dword ptr [esp+8],eax 
10227921  fild        dword ptr [esp+8] 
10227925  fmul        st,st(4) 
10227927  fmul        st,st(5) 
10227929  fmul        qword ptr [esi+48h] 
1022792C  fxch        st(2) 
1022792E  fmul        qword ptr [ecx+10h] 
10227931  faddp       st(2),st 
10227933  fxch        st(1) 
10227935  fadd        qword ptr [edi+10h] 

Hm. Seems to do this without any jumps.

The added with the 80386 description says SETNE means “Set byte to one on condition”.

I don’t much like the complexity in the line above.

bool bupdir = (tz.z >= 0.0); 
102278FA  fldz             
102278FC  mov         ecx,dword ptr [esi+28h] 
102278FF  fcomp       qword ptr [ecx+10h] 
10227902  fnstsw      ax   
10227904  test        ah,41h 
10227907  jp          (1022790D) 
10227909  mov         dl,1 
1022790B  jmp         (1022790F) 
1022790D  xor         dl,dl 

I better quickly stop before I get drawn any further into this…

1 Comment

  • 1. James Cranch replies at 18th March 2010, 6:03 pm :

    I’m glad it’s entered your standard library!

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <blockquote cite=""> <code> <em> <strong>