## The new Desig function

Tuesday, March 16th, 2010 at 8:16 pm

I have a minimal library of trivial functions in my C++ code. Things like:

```inline double Square(double x)
{  return x*x; }

inline double Len(const P3& a)
{  return sqrt(Square(a.x) + Square(a.y) + Square(a.z); }

inline P3 ConvertGZ(const P2& a, double z)
{  return P3(a.u, a.v, z); }

inline P2 operator*(const P2& a, double lam)
{  return P2(a.u * lam, a.v * lam); }
// note scalar multiplication on right.

inline int Signum(double x)
{  return (x < 0.0 ? -1 : (x > 0.0 ? 1 : 0)); }
```

After years of it being pretty stable, I have discovered a new function that is worth adding:

```inline Desig(double x, bool bpostive)
{  return (bpositive ? x : -x);  }
```

Here is how it looks in some real code:

```double zlo = tpf.z + tz.z * lzoclo
```

Previously, this would have been:

```double zlo = tpf.z + tz.z * lzoclo
- (bupdir ? sqrt(ezsq) : -sqrt(ezsq)) * rad;
```

Or even:

```double zlo = tpf.z + tz.z * lzoclo
- sqrt(ezsq) * rad * (bupdir ? 1 : -1);
```

But that’s relying on the compiler being clever enough to avoid applying that extra multiplication, and knowing how to recognize it as a simple sign invert.

These little inline functions get around limitations in the expressiveness of the C language, where there is a negator operator, but no negator type with which I could have written:

```double zlo = tpf.z + tz.z * lzoclo
- (bupdir ? + : -)sqrt(ezsq) * rad;
```

Using this Desig() function, the compiler now knows exactly what I want: a sign change:

```double z1 = tpf.z + tz.z * lzohi + t * rad * Desig(ez, bupdir);
10227915  test        bl,bl
10227917  fld         qword ptr [esi+48h]
1022791A  jne         (1022791E)
1022791C  fchs
1022791E  mov         edx,dword ptr [esi+24h]
10227921  fmul        st,st(4)
10227923  fmul        st,st(5)
10227925  fxch        st(2)
10227927  fmul        qword ptr [ecx+10h]
1022792C  fxch        st(1)
```

It looks convincing: a one line jump over an fchs (change sign) function, as imported from the 8087 floating point coprocessor instruction set.

Hm. I never thought of looking at disassembly code before. Let’s check the alternatives.

```double z1 = tpf.z + tz.z * lzohi + t * rad * (bupdir ? ez : -ez);
10227909  fld         qword ptr [esi+48h]
1022790C  mov         bl,1
1022790E  jmp         (10227917)
10227910  fld         qword ptr [esi+48h]
10227913  xor         bl,bl
10227915  fchs
10227917  mov         edx,dword ptr [esi+24h]
1022791A  fmul        st,st(4)
1022791C  fmul        st,st(5)
1022791E  fxch        st(2)
10227920  fmul        qword ptr [ecx+10h]
10227925  fxch        st(1)
```

And:

```double z1 = tpf.z + tz.z * lzohi + t * rad * ez * (bupdir ? 1 : -1);
1022790F  xor         eax,eax
10227911  test        dl,dl
10227913  setne       al
10227916  mov         edi,dword ptr [esi+24h]
10227919  lea         eax,[eax+eax-1]
1022791D  mov         dword ptr [esp+8],eax
10227921  fild        dword ptr [esp+8]
10227925  fmul        st,st(4)
10227927  fmul        st,st(5)
10227929  fmul        qword ptr [esi+48h]
1022792C  fxch        st(2)
1022792E  fmul        qword ptr [ecx+10h]
10227933  fxch        st(1)
```

Hm. Seems to do this without any jumps.

The added with the 80386 description says SETNE means “Set byte to one on condition”.

I don’t much like the complexity in the line above.

```bool bupdir = (tz.z >= 0.0);
102278FA  fldz
102278FC  mov         ecx,dword ptr [esi+28h]
102278FF  fcomp       qword ptr [ecx+10h]
10227902  fnstsw      ax
10227904  test        ah,41h
10227907  jp          (1022790D)
10227909  mov         dl,1
1022790B  jmp         (1022790F)
1022790D  xor         dl,dl
```

I better quickly stop before I get drawn any further into this…

### 1 Comment

• 1. James Cranch replies at 18th March 2010, 6:03 pm :