Wednesday, July 31, 2019

To go fast, you have to share

The paper I'm going to tell you about today is a nice advance for aggregate computing, but also one that I feel quite ambivalent about. On the one hand, the new "share" operator that it introduces is a nice piece of work that doubles the speed of pretty much every aggregate program that we've implemented. On the other hand, it's fixing a problem that I find embarrassing and wish we'd found a way to deal with long ago.

The origin of this problem goes back a long, long way, all the way to my first serious publication on engineering distributed systems, Infrastructure for Engineered Emergence on Sensor/Actuator Networks, published more than a decade ago in 2006 back in the middle of grad school. This is the publication that started my whole line of work with Proto, spatial computing, and aggregate computing. Unfortunately, it contains a subtle but important flaw: we separated memory and communication.

In principle, this makes a lot of sense, and it's the way that nearly every networking system has been constructed: sending information to your neighbors is, after all, a different sort of thing than remembering a piece of state for yourself. But this choice ended up injecting a subtle delay: when a program receives shared information with "nbr", it has to remember the value with "rep" before it can share the information onward in its next "nbr" execution. Every step of propagating a calculation thus gets an extra round of delay, though it never really mattered much when we were operating more in theory and simulation and assuming fast iterations.

Handling sharing ("nbr") and memory ("rep") separately injects an extra round of delay while information "loops around" to where it can be shared.  Combining them into the single "share" operator eliminates that delay.

Now that we're doing more deployments on real hardware, however, there's often good reasons to keep executions slower in order to save network capacity. And that, finally, has motivated us to fix the delay by combining the "nbr" and "rep" operations into a single unified "share" operation that sends the value stored to one's neighbors.

Theoretically, it's elegant, since this one operation can actually implement both of the previous separate functionalities. Pragmatically, it's a lifesaver, since pretty much every program we run just started converging at least twice as fast, if not faster.  I also wonder how many other distributed algorithms built by other people have this subtle flaw hiding inside of them---though most algorithms probably won't just because they're implemented so much more at the "assembly language" level in how they handle interactions, and the humans implementing them will likely have spotted the optimization opportunity and taken it.

So, all's well that ends well, I guess. I just wish I'd thought of this a decade back.

No comments: