Reason should adopt the Apache Spark than MapReduce

Adoption of Apache Spark is increasing. The biggest attraction is, that thing overwhelmingly faster than MapReduce. Beyond that, various people talks about a lot of benefits. On the other hand, there is also the opinion that do not employ the Spark. Why not?


Reason should adopt the Apache Spark than MapReduce

As a framework for running in “Apache Hadoop” (hereinafter Hadoop) environment, recently appeared, the adoption of high-speed processing is good at “Apache Spark” (hereinafter Spark) is increasing rapidly. This phenomenon is at least, and extruding the Spark to the front, it has become a message from the big data supplier to predict would become the mainstream of the next big data world.

In June 2015, “Spark Summit” conference was held in San Francisco. The Chief Strategy Officer of the US Cloudera on the occasion, said Mike Olson is the rapid growth of the Spark as “breath stick to unexpected momentum” representation. that what customers want (from MapReduce to the Spark) has been completely migrated, it was reported that Hadoop distributions are also realized in the position of the company that sells.

“In the near future, when the Spark has become the mainstream of general purpose (generic) processing framework for Hadoop we have predicted,” he said. “If you’re looking for a typical engine suitable for the purpose, you would choose Spark instead it now if MapReduce”

Reason should adopt the Apache Spark than MapReduceReason should adopt the Apache Spark than MapReduce

When Olson said this remark, but carefully so chosen words. In particular, “general purpose” (general purpose) it suggests where you were limited with. The he was so expressions, such as “Cloudera Impala” for the “Apache Solr” or SQL query for the search in Hadoop, because not smaller area to be active the engine that has been developed for special applications (so “general It intended “is limited). But hegemony struggle of the framework for developers to use when creating the analytics workload wide variety to new, are showing signs of single combat. And it apparently as Spark is dominant.

It’s a very simple story. Spark, the developers a number of problems faced by MapReduce, which has been repeatedly criticized by the previous, but in particular and it is high latency, because they splendidly cope with slow response of batch mode.

“In Hadoop has grown world, MapReduce between long, has maintained a reputation for its robustness” to talk to, it’s Alan Murphy of the architect and founder of the US Hortonworks.

Mr. Murphy for MapReduce, this is a technology that has been made in Google’s lab, very specific use cases, that is, it pointed out that was meant for processing a Web search. Although this technology has been repeatedly evolved over more than 10 years of time, but still it’s probably insufficient to meet the requirements of large organizations seek to big data applications.

Reason should adopt the Apache Spark than MapReduceReason should adopt the Apache Spark than MapReduce

“The advantage of MapReduce is, but flexibility to address multiple use cases,” he adds. “The MapReduce have been long in use, it is because the use cases that can be resolved by this exists certainly. In just as the process can not be said to be optimal .MapReduce was once the destroyer other technologies, new technology appeared, to dumped the MapReduce, or the replace It’s very natural flow. ”

Execution speed and simplicity

So, somewhere it is given to the point that Spark is so much better. The main advantage for developers’s processing speed.