## Nine finger exercises

Tuesday, October 24th, 2006 at 5:34 pm

When professional authors write pot-boilers and turn out books by the yard that don’t require much thought on their part to type out, they are doing ten finger exercises. They produce a hundred thousand words of some kind of routine adventure fantasy in three months with the same plot as the previous volume of the decology, it’s printed, and the fans buy it.

It also describes what I’m doing programming a set of optimizations to the algorithms of the kind I have written several times already. The slow version, which works, got written in an hour. Now I have to type out at great length an optimized version that produces exactly the same answer in one hundredth of the computational time.

The function in question, in this instance, is `DistanceToPath`. You have a toolpath — a piecewise linear path, which is nothing more than a chain of XYZ points with straight lines in between — and another point in space, and you need to calculate the shortest distance between the point and the path.

Let’s start with the distance between a point `p` and a straight line `AB`. The function goes like this:

```double DistanceToLine(p, A, B)
{
lam = Dot(p-A, B-a) / Dot(B-A, B-A)
if (lam < 0)
return sqrt(Dot(p-A, p-A))
if (lam > 1)
return sqrt(Dot(p-B, p-B))
q = A + lam * (B-A)
return sqrt(Dot(q-p, q-p))
}```

I got much better notation than this, but since you don’t want to read my library of simple classes of vector calculations, I’m using as few functions as possible. I call my XYZ class `P3`. My 2D point class is called `P2` and its two variables are `u` and `v`, so if I hit a `P3` instead of a `P2` in the code, it won’t compile. Obviously the ‘-‘-operator is overloaded between `P3`s, and the `Dot(A, B)` is the dot product of the two vectors:

```double Dot(A, B)
{
return A.x*B.x + A.y*B.y + A.z*B.z
}```

The length of a vector `A` is `sqrt(Dot(A, A))`.

Now, suppose I have a piecewise toolpath represented by path, which is of type `array<p3>`, so we have functions like `path.size()` and `path[i]`, which is the `(i+1)`th `P3` value in the path.

Now we are ready to define the main function:

```DistanceToPath(p, path)
{
d = sqrt(Dot(path-p, path-p)
for (i = 1; i < path.size(); i++)
d = min(d, DistanceToLine(p, path[i-1], path[i]))
return d
}```

That’s the sum total of the function. So how have I expanded it up into 300 lines of code? Well, if `path.size()` is very big (like most toolpaths), then this is going to take a while to run, and a lot of the time will be spent calling the `DistanceToLine()` function on line segments that couldn’t possibly contribute the the value of `d`.

Optimization is possible when the call to the function `DistanceToPath` is not isolated, that is, it’s going to be called a billion times with the same path, but for different points. Then it pays to do some precalculation.

Here is the simplest boxing strategy (sometimes known as bucketing).

Create the class:

```class strip
{
int i0, i1;
double xlo, xhi;
}```

And do the following precalculation to create an array of these strips:

```CreateBoxing(path)
{
strips = array<strip>[(path.size() - 1) / 10 + 1];
for (int i = 1; i < path.size()
{
s = i / 10
if (s * 10 + 1 == i)
{
strip[s].i0 = i-1
strip[s].xlo = path[i-1].x
strip[s].xhi = path[i-1].x
}
strip[s].i1 = i
strip[s].xlo = min(strip[s].xlo, path[i].x)
strip[s].xhi = max(strip[s].xhi, path[i].x)
}
return strips
}```

This simply divides up the path into groups of 10 line segments and records the range in the X-coordinate which they span. Now we can write the possibly 10 times faster version of the function based on this precalculation, which goes like:

```DistanceToPath(p, path, strips)
{
d = sqrt(Dot(path-p, path-p)
for (s = 0; s < strips.size(); s++)
{
if ((p.x - d <= strip[s].xhi) and (p.x + d >= strip[s].xlo))
{
for (i = strip[s].i0 + 1; i < = strips[s].i1; i++)
d = min(d, Distance(p, path[i-1], path[i])
}
}
return d
}```

All it does is skip over the grouped sections in the path which are too far away from the point to make a contribution to the result.

The whole idea is very simple, but to explain it in code this way has required a huge simplification of what I have actually put in the real implementation, and it still looks terrible. Hence all this finger exercising to turn what is a straightforward idea into code. Someone ought to sort this problem out and develop a notation to communicate it properly.

There are some real questions that could be studied with some statistical geometry. In this example I have boxed up the path in groups of 10 and made them into strips. I could have made a range in y as well as x, but it would have doubled the length of this code. What number should I have used? If I picked the value 1, then `strips.size()` would have equalled `path.size()`, and `DistanceToPath` would have run exactly as it had before, calling the `DistanceToLine` function the same number of times. If I picked the value `path.size()`, then `strips.size()` would have been 1, and it would have also been the same. These two extreme values correspond to putting every line segment into its own box, or putting them all in a single box. For all values in between `DistanceToLine` is called fewer than `path.size()` times, on average, and there is an improvement.

What is the optimal value? The graph of optimization (average number of times `DistanceToLine` is called for a random sample of points, against the strip number), can be made, but it would require a fancy statistical model to predict and model what’s going on.

An alternative boxing scenario is to use fixed boxes (strips). In this case, each strip would have its `xlo,xhi` values set at the start, and the range values `ilo,ihi` would be replaced with an `array<int>` so that each line segment could be listed by the strips which they overlapped. Here, if you had one strip that covered the whole width of path, then it would be as before, and if you have too many narrow strips you get each line segment covering hundreds of boxes. Where is the optimum?

This is related to a common school mathematics question where you drop a pencil at random onto a wooden floor made of boards whose width equals the length of the pencil, and ask: What is the probability that the pencil will cross the line between two boards? The answer is `2/pi`, and it requires you to know the integral of `arccos` to find it out.

Now suppose you drop a pencil of length `r` onto a square tiled floor where tiles are of width `t`. What is the average number of tiles the pencil will cross? The answer to this would be part of the study on the theory of boxing which would tell you how to choose the optimal values in these algorithms. It gets very complicated, because the line segments of a path are not random; they are chained, so the probability that one crosses the boundary of a box affects the probability that the next one will.

I haven’t done any of this. It’s what people in universities ought to be writing papers about. When I program I just throw in some values plucked out of the air, like 5mm, or 3 times the average length of a path segment, and that’s what it remains set at in the system for decades. All it takes is one university student to spot that this is a relevant problem that goes somewhere, work it out, test it, put it on the net, and we can all use it. If there was any justice, doing something like that would make it very easy to get a programming job in CADCAM, more so than good exam results.

Last week the tip of the little finger of my right hand got badly hit by the squid while playing octopush. It’s still too painful to type with.

• 1. SJ replies at 4th November 2006, 2:38 pm :

How many segments does the toolpath typically contain? How many would
the largest one you would even attempt to manipulate contain?

Assuming all the data fits in memory and you can afford a little bit
of extra storage, try a divide-and-conquer approach using axis-aligned
bounding boxes around sections of your toolpath. The corners of such
a bounding box yield both upper and lower bounds on the distance
corresponding to a closest point lying on the section it contains.
This means that you can discard altogether the contents of any box
whose lower bound is greater than the upper bound of another box.
Keep the still-active boxes in a queue and process in breadth-first
search order – if the section the box contains has more than N
segments in it, split it in half, recalculate the (usually tighter)
bounds for each half, and put them back in the queue. A box with
fewer than N segments in it you can just go through and find the
closest point inside. As you go on the best bounds will tighten and
more sections will get discarded without any further examination, job
done. You can keep the full tree of boxes around for subsequent
queries, if you want, saving a bit of rescanning. The breadth-first
search will tend to tighten the global bounds on the answer sooner,
and can easily be farmed out to multiple processors. Choose N to tune
the tradeoffs between the different costs involved (you might be able

• 2. Julian replies at 13th November 2006, 11:56 am :

The example I put in here was the most simple implementation of a boxing algorithm I could imagine. It’s nowhere the most efficient; it’s just short, but complex enough to demonstrate many of the characteristics of these algorithms.

As I said, there’s lots of obvious approaches to boxing, and they all lead to literally endless amounts of coding. It’s difficult to tell what’s going to be the most optimized approach before you implement it fully in one way or another. No one bothers to study it, because once you’ve picked one technique and invested years into developing it, you don’t want to consider the alternatives because you might not like the answer.

I believe the truth will come from people who have a lot of curiosity for all the different ways to put together and optimize the different boxing algorithms, than from anyone who thinks they see the best way to do it on first inspection.

The fact is, any boxing set-up you can code will be so much better than not having it, that it will always be a giant leap forward. Sometimes there are further elaborations that make it even better. But I have yet to see anyone apply any science to this art.