When thinking about platforms to handle big data probably the most terms will likely be “Hadoop” and “MapReduce” however there are other projects out there such as the Flat Datacenter Storage (FDS) from Microsoft Research. In a recent Microsoft Research article, “Data in the Fast Line”, this new platform was described as “a radically different approach to sorting” from the 2004 Google research project that led to MapReduce. The problem, as the Microsoft Research team highlight, is that there are limitations to the existing thinking behind handling big data which relies on programs being sent to the data particularly when the data needs to be moved, for example, if you want to join two large data sets for analysis.
Another alternative might be the open source High Performance Computing Cluster (HPCC) from HPCC Systems, a division of LexisNexis. In HPCC the Data Refinery Cluster “Thor” and Query Cluster “Roxie” act as an alternative to Apache Hadoop but differentiate by aiming to avoid the MapReduce bottlenecks where, as HPCC Systems say, “each of the phases for these cycles cannot be started until the previous phase has completed for every record.”