Temporary exile

Why Must It Be So Hard to Cluster?

In the past, I’ve gotten pretty upset by how difficult it is to take advantage of multiple computers on a network for general tasks. In this age of advanced Linux software, I’m still shocked at how hard it is to cluster machines. Let’s say I have three or so machines on my local network. If the task is something commonplace like encoding audio or compiling, I can use either distmp3 or distcc, respectively. Alternatively, if I want to share disk space among nodes, I could use a clustered file system such as Lustre) or GFS. After that, I’d have to put together a more formal) cluster like OpenMosix (now abandoned), OpenSSI, Kerreghed (comparison paper, PDF) or some other option. The next step is to write my own applications to do something explicitly parallel using any number of options like OpenMP, PVM along with trendy stuff like hadoop and MapReduce. I can always opt for just doing it by hand using distributed objects for a given language. Apropos, Ruby has positively stellar support for distributed objects indcluding Rinda, an implementation of tuple-spaces (ala Linda)) which provides nifty things (auto-discovery, among other features). Still, these options don’t help me build a general usage cluster out of machines. Then there are the tools to control the actions of the machines remotely like clusterssh, dsh and gsh. So far, my options are: 1. Settle for the limited capabilities except for select tasks. 2. Write my own app to do something (or everything, which is a bad

  1. Deal with it and control actions using a remote, group-admin tool.

I understand how the landscape could reach such a state, but I don’t like the fact that this is the same set of options I’ve had for the last five years or so. Are there options I’m overlooking? Is there something I don’t know about? The only thing I can see down the pipeline is GNU Queue (got a tipoff from mct) which might very well be exactly what I’ve been dreaming of. Unfortunately, no releases have yet been made, so certainly no chance of using it now.

