Freesteel Blog » The Queue of Queues Fallacy

The Queue of Queues Fallacy

Friday, May 24th, 2013 at 4:56 pm Written by:

Believe it or not, I’ve been working on the Adaptive Clearing strategy recently, attempting to upgrade it to have multi-core capabilities. As you know, the free lunch was declared over 2005 when CPU clock-cycles stopped increasing exponentially. Now, instead, the numbers of cores are increasing. I have 8 cores on my current laptop, and when my state-of-the-art-algorithm runs on it using only 12.5% of the computing capability, it’s embarrassing.

To break the algorithm down into components that can be processed concurrently, I have separated it into 7 working threads, each of which pipes objects to the next layer through a queue. I was told that this is pretty standard architecture — even though it doesn’t look like it can be done using the concurrent.futures module in Python (where I looked for inspiration) because it looked like it needed the complete list of tasks before it could distribute them — at least in an earlier incarnation — and not a pipeline of data. I was also confused by the use of the word “futures” for processes not yet complete, which was similar to the __futures__ module which was features from a later version of Python into an earlier one. I don’t like the terminology anyway, but that’s just my opinion.

Anyway, a Queue is a structure that has functions queue.put(object) [to the front] and queue.get() [returns an object from the back], where each end can be handled by a different operator, like like this:

Let’s say op1 is sending in horizontal slices of the model (the objects obj) and op2 is calculating the clearing toolpath at each of the horizontal layers. We can run these two operations in different threads, and it will all work tickety-boo. The operating characteristics of the threads getting and putting from and into queues is that when a thread gets from a queue that it empty, it automatically sleeps (consuming no processor cycles) until something is put into the queue at the other end, at which point it automatically wakes up and retrieves the object. The opposite happens on the put if you have specified an upper bound for the capacity of the queue and you hit it.

Let’s look at the picture from the point of one thread in the middle getting from an input queue Queue1 and putting to an output queue Queue2.

And here’s what it looks like when I set up two threads getting from a single input queue and putting to a single output queue.

Note that the if both threads (both performing operation op1) are getting from Queue1 when it is empty (and therefore they are both sleeping), only one of them will wake and get an object that is put into it at the other end.

The problem here is that the objects can potentially be put into Queue2 out of order, for example if obj2 happens to complete in half the time of obj1.

To deal with this situation, I designed the following complicated structure to keep everything in order by embedding a miniature queue into each object in the queue, and forming a second queue known as the Queue of Queues.

Not shown is op0 which is generating each objn containing each corresponding Qn, and putting them both into their respective queues. Then every time op1 executes and has its output, it puts it into the Qn that came with the object.

All that op2 now has to do is unpack the queue of queues, by doing Queue2.get().get() instead of Queue.get(). In this way it first gets the miniature output queue that was created with the object, and then hangs on the queue until the output of that object is put.

This is over-complex and stupid, but seemed obvious at the time in my actual case where multiple objects (toolpath segments) were put into each output queue for each single input object (the slice for which all those toolpath segments are to be generated). In this case each of the miniature queues needed a termination object so op2 knew to advance to the next queue corresponding to the next object.

So the code for op2 looked more like:

while True:
    q = Queue2.get()
    while True:
        obj = q.get()
        if obj == 'end'
             break
        process(obj)

Like I said, I was really proud of this complex structure, until I decided that I might like to convert it into C++ (for reasons I’ll explain at a later date), and decided it was looking like a lot of work. C++ is not as good as Python for putting together complex dynamic structures like this.

Why don’t I — I said to myself — sequentially number the objects in Queue1 when I run my concurrent versions of op1 and simply reorder them in op2?

Now we go back to the simpler version. Let obj.sequence be a number starting at 0 that increments by 1 for each object in Queue1.

Then the reordering code in op2 looks like:

current_sequence = 0
obj_buffer = { }
while True:
    if current_sequence in obj_buffer:
        obj = obj_buffer.pop(current_sequence)
        process(obj)
        current_sequence++
    else:
        obj = Queue2.get()
        obj_buffer[obj.sequence] = obj

Note that if the queue is completely in order, we go through the loop twice for each object, first in the else block, and second in the if block.

The code is not much more complex when we have multiple objects being produced by op1 for each input object, but I won’t bore you. The Queue object and everything associated with it is much simpler and is working happily in the production code.

I have stripped out one of the two instances of the Queue of Queues from the code, but not the bigger one yet. A lot of code will be removed in the process. I love deleting code.

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <blockquote cite=""> <code> <em> <strong>