yonkeltron.com

Temporary exile

Can I Get a Little MapReduce From My Debian People?

Debian is a world-class Linux distribution. It is used on it’s own for so many applications (desktop, laptop, workstation, handeld, server, etc.) as well as the foundation for so many wonderful projects ((U|K|X)buntu, Maemo, etc.). Personally, I run Debian on my laptop as well as my servers.  In fact, when I went to see about setting up a little ad-hoc cluster, I was rather disappointed. Though there are a few clustering tools available, as well as several distributed filesystems (GFS, GlusterFS, OCFS2, and Lustre), shockingly, I could not find any implementation of MapReduce available in the Debian repositories. For those who might not know, MapReduce is a novel data-processing system developed by Google for internal usage and described in their publication entitled MapReduce: Simplified Data Processing on Large Clusters. For the enlightened out there, it should be clear that the name and mechanism are derived from Lisp’s map and reduce functions. In any case, though Google’s implementation is proprietary, there have been several implementations based on their paper both written in and geared toward a variety of programming languages. Unfortunately, none of these are available in the Debian repositories. In all fairness, Debian does include CouchDB which uses map and reduce functions for generating views. However, it’s not a solution aimed at sorting and processing huge amounts of data, though it is an interesting and capable piece of software. So, to try and get things moving, I have filed three Debian RFPs (Request For Package) for a few seperate MapReduce implementations. - Hadoop

- Probably the most well-known of the Free/Open Source
implementations. Includes a distributed filesystem (HDFS),
scaleable distributed database (HBase) and tools to get you going
from start to finish. Hadoop is written in Java though it can
interoperate with other languages
([Scala](http://scala-blogs.org/2008/09/scalable-language-and-scalable.html),
too). It's a top-level project of the
[Apache Software Foundation](http://www.apache.org/) and licensed
under the
[Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0.html)
- [http://hadoop.apache.org](http://hadoop.apache.org/)

Ok, there might be a few objections to my choices. Why did I leave out neat projects like GridGain, FileMap and BashReduce? Well, for starters, GridGain is another Java implementation that doesn’t seem (at least to me) to have the same momentum Hadoop does. FileMap and BashReduce, while novel, useful and fascinating, are not designed for use in networked environments and are therefore unsuitable for cluster situations. So then whey not MapSharp? Well, primarily because of all the Debian Mono debates going on right now (Gnome’s fail!) . I’ve done work in C# and it’s got some neat features but cool stuff doesn’t and will not ensure that users are not liable from patent litigation. Also, it seems like those RFPs have some mistakes, so if anyone figures out how to edit them, let me know so I can clean them up.

Buy softtabs Cialis Online Buy Cialis pill Buy ED pills non generic Buy Levitra. Effects of Levitra Buy lady uk Cialis Generic 25 mg Levitra Online 5 mg Viagra Cialis Levitra Chicago Cialis