Freesteel » The new Desig function
The new Desig function
Tuesday, March 16th, 2010 at 8:16 pm
I have a minimal library of trivial functions in my C++ code. Things like:
inline double Square(double x)
{ return x*x; }
inline double Len(const P3& a)
{ return sqrt(Square(a.x) + Square(a.y) + Square(a.z); }
inline P3 ConvertGZ(const P2& a, double z)
{ return P3(a.u, a.v, z); }
inline P2 operator*(const P2& a, double lam)
{ return P2(a.u * lam, a.v * lam); }
// note scalar multiplication on right.
inline int Signum(double x)
{ return (x < 0.0 ? -1 : (x > 0.0 ? 1 : 0)); }
After years of it being pretty stable, I have discovered a new function that is worth adding:
inline Desig(double x, bool bpostive)
{ return (bpositive ? x : -x); }
Here is how it looks in some real code:
double zlo = tpf.z + tz.z * lzoclo
- Desig(sqrt(ezsq), bupdir) * rad;
Previously, this would have been:
double zlo = tpf.z + tz.z * lzoclo
- (bupdir ? sqrt(ezsq) : -sqrt(ezsq)) * rad;
Or even:
double zlo = tpf.z + tz.z * lzoclo
- sqrt(ezsq) * rad * (bupdir ? 1 : -1);
But that’s relying on the compiler being clever enough to avoid applying that extra multiplication, and knowing how to recognize it as a simple sign invert.
These little inline functions get around limitations in the expressiveness of the C language, where there is a negator operator, but no negator type with which I could have written:
double zlo = tpf.z + tz.z * lzoclo
- (bupdir ? + : -)sqrt(ezsq) * rad;
Using this Desig() function, the compiler now knows exactly what I want: a sign change:
double z1 = tpf.z + tz.z * lzohi + t * rad * Desig(ez, bupdir); 10227915 test bl,bl 10227917 fld qword ptr [esi+48h] 1022791A jne (1022791E) 1022791C fchs 1022791E mov edx,dword ptr [esi+24h] 10227921 fmul st,st(4) 10227923 fmul st,st(5) 10227925 fxch st(2) 10227927 fmul qword ptr [ecx+10h] 1022792A faddp st(2),st 1022792C fxch st(1) 1022792E fadd qword ptr [edx+10h]
It looks convincing: a one line jump over an fchs (change sign) function, as imported from the 8087 floating point coprocessor instruction set.
Hm. I never thought of looking at disassembly code before. Let’s check the alternatives.
double z1 = tpf.z + tz.z * lzohi + t * rad * (bupdir ? ez : -ez); 10227909 fld qword ptr [esi+48h] 1022790C mov bl,1 1022790E jmp (10227917) 10227910 fld qword ptr [esi+48h] 10227913 xor bl,bl 10227915 fchs 10227917 mov edx,dword ptr [esi+24h] 1022791A fmul st,st(4) 1022791C fmul st,st(5) 1022791E fxch st(2) 10227920 fmul qword ptr [ecx+10h] 10227923 faddp st(2),st 10227925 fxch st(1) 10227927 fadd qword ptr [edx+10h]
And:
double z1 = tpf.z + tz.z * lzohi + t * rad * ez * (bupdir ? 1 : -1); 1022790F xor eax,eax 10227911 test dl,dl 10227913 setne al 10227916 mov edi,dword ptr [esi+24h] 10227919 lea eax,[eax+eax-1] 1022791D mov dword ptr [esp+8],eax 10227921 fild dword ptr [esp+8] 10227925 fmul st,st(4) 10227927 fmul st,st(5) 10227929 fmul qword ptr [esi+48h] 1022792C fxch st(2) 1022792E fmul qword ptr [ecx+10h] 10227931 faddp st(2),st 10227933 fxch st(1) 10227935 fadd qword ptr [edi+10h]
Hm. Seems to do this without any jumps.
The added with the 80386 description says SETNE means “Set byte to one on condition”.
I don’t much like the complexity in the line above.
bool bupdir = (tz.z >= 0.0); 102278FA fldz 102278FC mov ecx,dword ptr [esi+28h] 102278FF fcomp qword ptr [ecx+10h] 10227902 fnstsw ax 10227904 test ah,41h 10227907 jp (1022790D) 10227909 mov dl,1 1022790B jmp (1022790F) 1022790D xor dl,dl
I better quickly stop before I get drawn any further into this…
1 Comment
1. James Cranch replies at 18th March 2010, 6:03 pm :
I’m glad it’s entered your standard library!
Leave a comment