Freesteel Blog » Concurrent clashing of piping resources

Concurrent clashing of piping resources

Sunday, October 13th, 2013 at 1:39 pm Written by:

So, we’re having lots of fun with subprocesses kicking off independently executed processes and transmitting data to and from them through its stdin and stdout pipe.

The problem we had was the data describing the clearing work in each level was encoded big disorganized C++ object with lots of values pointers, arrays, and junk like that, and I was too lazy/wise to build a whole serialization protocol for it. I decided it was much easier to make a pure Python unpacking (and repacking) of the C++ object and then simply encode and decode it into a stream of bytes with the pickle module.

pyzleveljob = MakePurePythonCopyOfDataZlevelCobject(zleveljob)  # PyWorkZLevel_pickle  yellow
sout = cPickle.dumps(pyzleveljob, cPickle.HIGHEST_PROTOCOL)     # just_pickle  red
p = subprocess.Popen(["python", "-u", ""], 
                      stdin=subprocess.PIPE, stdout=subprocess.PIPE, universal_newlines=False)
p.stdin.close()       # the subprocess starts executing here at popen  blue

That’s obviously the simplified code. The -u option is to ensure that the pipes are treated as pure binary streams at both ends.

Here’s what the timing graph looks like, with time going from left to right, with seven concurrent process threads.

The zleveljob work isn’t getting that much bigger the further you go down into the job (in fact it should get smaller as the area you cut shrinks), so I couldn’t explain why the MakePurePythonCopyOfDataZlevelCobject() function was taking more time in each thread. Maybe it’s to do with a failure in concurrency in that they are all clashing over the same Python resource and blocking one another.

This is for the same reason your group doesn’t get served food twice as fast by splitting into two tables when there is only one chef in the back kitchen.

Just to try this out, I added a time.sleep(threadnumber*0.5) in front of the code above to stagger the work out, so that they all tool appreciably the same time.

I decided to change things around a bit and use the streaming version of cPickle. This appeals more to the programmer because IO is slow, so you might as well start ditching the first sections of the data out as soon as you can and not wait for the whole byte string to be constructed as one blob before anything gets transmitted.

The code is rejigged.

pyzleveljob = MakePurePythonCopyOfDataZlevelCobject(zleveljob)       # PyWorkZLevel_pickle  yellow
p = subprocess.Popen(["python", "-u", ""], 
                      stdin=subprocess.PIPE, stdout=subprocess.PIPE, universal_newlines=False)
sout = cPickle.dump(pyzleveljob, p.stdin, cPickle.HIGHEST_PROTOCOL)  # just_pickle  red
p.stdin.close()       # the subprocess starts executing here at popen  blue

Even though the MakePurePythonCopyOfDataZlevelCobject() has significant overlaps, it is no longer causing such a radical extension of time. But the streaming cPickle.dump() seems to take too long to complete now, even though it is IO bound.

What happens when we stagger these:

It takes barely any time. I don’t know what’s going on in the back kitchen. It’s hard to find out.

I almost wasted a lot of time optimizing these functions and making them run in C++ with complex bug-riddled binary protocols. This says that once these processes start getting out of sync with one another, this effect is not going to make much of a difference.

There are a lot of other stages in the Adaptive Clearing strategy that are holding up the final completion that we should look at first if we want to improve it. I’ve already experienced enough nightmares where I worked for a month to double the complexity of the system and in the end all it did was make everything slower.

Something didn’t look quite right with that first picture, so I zoomed in on it to find that the real and most visible blocking is occurring at the cPickle stage, not with the MakePurePythonCopyOfDataZlevelCobject(). You can see it with the sudden halting of activity during those phases:

In other words, the MakePurePythonCopyOfDataZlevelCobject()s aren’t holding back one another so much as the cPickle is holding back everything including all the MakePurePythonCopyOfDataZlevelCobject()s that happen to be running at the same time. That was the mistake I made. This had been suspected since reading the blogpost beware of cpickle.

The plan now is to run this past the 16 core Z820 and see if this gets us into the required 10x improvement realm.

Leave a comment

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <blockquote cite=""> <code> <em> <strong>