文档章节

MapR Hadoop

我是彩笔
 我是彩笔
发布于 2014/11/07 10:06
字数 1543
阅读 221
收藏 1

When it comes to Hadoop distributions, enterprises care about a number of things. Among them are high performance, high availability, and API compatibility. MapR, a San Jose, Calif.-based start-up, is betting that enterprises are less concerned with whether the distribution is purely open source or if it includes proprietary components. That’s according to Jack Norris, MapR’s vice president of marketing. He said MapR is the market leader in all three of the top Hadoop priorities – performance, availability, and API compatibility – and it has the customers to prove it.

Currently MapR has between 40 and 50 paying customers using its enterprise M5 Hadoop distribution, which, as everyone knows by now, includes the proprietary NFS storage layer. They include commScore, the online market intelligence firm, which recently dumped Cloudera’s Hadoop distribution for M5. In addition, the company’s free community distribution, M3, has been downloaded thousands of times according to Norris.

MapR’s performance and availability advantages over competing Hadoop distributions, Norris explained, are due in part to:

  • M5’s distributed namenode architecture, which removes the single point of failure that plagues HDFS;

  • MapR’s Lockless Storage Services layer, which results in higher MapReduce throughput than competing distributions;

  • Its ability to run the equivalent number of jobs on fewer nodes, which results in overall lower TCO.


Figure 1 - MapR Hadoop Stack
Source: MapR 2011



But it’s the open source issue where MapR takes a lot of heat. Norris argues that MapR’s approach – improving upon an open source core with proprietary value-add components and services – is a pretty “standard” model in the commercial open source world. While that is a common commercial open source business model, many would argue that the storage layer in a Hadoop distribution is the core, not an add-on.

Norris also said what’s important is not that a given Hadoop distribution is purely open source or not, but that it is 100% API compatible with the Apache distribution, which M5 is. This, he said, means that while developers can’t fiddle with NFS, they can easily integrate MapR’s distribution with HBase, HDFS, and other Apache Hadoop components, as well as move data in and out of NFS should they choose to tap a different Hadoop distribution. This last point is particularly important. It means, according to MapR, that there is no greater risk for vendor lock-in with its Hadoop distribution than with any other.

MapR’s focus on performance, availability, and API compatibility over open source code also comes through in its go-to-market strategy. MapR is not interested in educating the wider market about the benefits of Hadoop, as Cloudera and Hortonworks seem to be, according to Norris. Rather, MapR is targeting companies that are already using Hadoop or have made the decision to deploy Hadoop and are evaluating their distribution options. MapR also has a relationship with EMC to ship parts of its distribution with EMC Greenplum’s Hadoop offering.

Norris said MapR is targeting customers who already understand what Hadoop can do and want a highly available, enterprise-ready version that they can quickly deploy and easily integrate with other big data tools and technologies through open APIs. MapR’s target customers already did the experimenting with Cloudera or Apache, Norris explained, and are now ready to move Hadoop into production.

Fact Checking MapR’s Approach

Let’s consider MapR’s claims one-by-one.

  1. API compatibility is more important than open source code. As Hadoop goes mainstream, traditional enterprise users will be more interested in deploying stable, high-performance, enterprise-ready big data stacks than in hacking the Hadoop core. In the meantime, however, big data application developers are adamant that they have access to the source code to integrate their wares seamlessly with Hadoop. In the long-term, this claim is probably accurate, but as Hadoop continues rapid development open source code is still a critical element for many.

  2. MapR provides better performance and availability than competing Hadoop distributions. It is certainly true that MapR’s distribution has demonstrated significant performance and speed improvements over “vanilla” Hadoop. That said, CIOs are increasingly less interested in “speeds and feeds” and more interested in how Hadoop can deliver real business value.

  3. Enterprises are at no higher risk for vendor lock-in with MapR than with competing Hadoop distributions. It will prove reassuring to potential MapR customers that moving data out of M5, should they choose to move to a different distribution, is no more difficult than with any other distribution thanks to M5’s API compatibility. Still, (and like Cloudera’s enterprise Hadoop distribution), M5 costs money. How much money an enterprise sinks into an M5 deployment will determine the cost-effectiveness of moving to a competing distribution. So the risk of vendor lock-in with MapR is probably even with that of Cloudera, but higher than that of Hortonworks' distribution or the straight Apache Hadoop distribution.

MapR’s strategy carries with it a number of risks. The biggest risk for MapR is that Apache Hadoop catches up to M5 in performance and availability capabilities before it, M5, gains wide-spread adoption, thus nullifying its entire value proposition. Indeed, Apache contributors recently introduced HDFS federation to tackle the single-point-of-failure issue “by adding support for multiple namenodes/namespaces to HDFS file system.”

Norris said that while MapR respects the competition, he doesn’t believe the Apache distribution is even close to reaching performance parity with M5. When it comes to the single-point-of-failure issue, for example. MapR’s distributed namenode is superior to namenode federation in that M5 “is self-healing, and no user intervention is needed at any point.” In any event, that is a judgment the community will make.

Another risk is that its message of performance/availability/compatibility over open source code never reaches CIOs, drowned out by the fervent open source Hadoop community as well as by marketing from competitors. Hortonworks, like most Benchmark-funded start-ups, is a marketing and PR machine, while Cloudera, with more than 100 paying customers, is double MapR’s size and is on the verge of becoming the de facto Hadoop distribution.

And don’t forger support services. Enterprises that deploy Hadoop want assurance that if there’s an issue with their cluster, the vendor is there ready and waiting to put out the fire with fast technical support and intervention.

The $10 billion question, then, is which of the three Hadoop distribution models will enterprises embrace. Cloudera differentiates its core open source Hadoop distribution with its proprietary management console, which the company updated just last week. Hortonworks is going to market as the only 100% open source commercial Apache Hadoop distribution and plans to make money on technical support services. MapR is betting enterprises serious about Hadoop will value its performance and availability advantages over open source code, with its API compatibility assuaging vendor lock-in concerns.

The race is on. For MapR to remain competitive, I believe it must take the following steps:

  • Develop deep and real partnerships with big data application vendors. Enterprises looking to capitalize on big data analytics are increasingly looking to application vendors that promise to deliver real business value from Hadoop. The more application vendors work closely with MapR, the more likely these vendors are to recommend MapR as the underlying Hadoop infrastructure.

  • Continue contributing to the community where it can. MapR recently established MapR Academy, an online resource for Hadoop training and education. MapR should continue efforts like MapR Academy, as well as contribute to the Apache Hadoop project when possible, to engender good-will in the Hadoop community.

  • Aggressively take its message of performance/availability/compatibility over open source code to enterprise CIOs and even CEOs, who are more interested in enterprise stability and performance than whether a technology is open source or not. If MapR can convince executives that its Hadoop distribution is more powerful, safe and cost-effective than competing distributions, it has a chance to slow Cloudera’s and Hortonworks’ momentum and give itself a fighting chance to win the market.

Action Item: Enterprises evaluating MapR’s Hadoop distribution should demand proof-points/customer references from the vendor that include illustrations of its open API claims, including the ability to easily move data into and out of its cluster. Enterprises looking to navigate the larger Hadoop distribution market should focus on which of the competing Hadoop approaches – Cloudera, Hortonworks or MapR-- brings the greatest business value with the lowest cost and least risk. As we’ve written before, for some enterprises the value of fast business impact on revenue or profit offered by MapR will outweigh the risks of higher capex and the inability to customize the code. For enterprises just beginning to learn about Big Data and the benefits of Hadoop, it may make more sense to adopt Cloudera or Hortonworks’ more open approach, betting that performance improvements the community will develop over time and the flexibility offered by an open source distribution will prove more valuable in the long-term. Whatever option enterprises choose, stay up-to-speed with developments in the Hadoop community, as both open and proprietary improvements that can deliver real business value are being made to the technology at a fast clip. Footnotes:

categories

Big Data, Hadoop, MapR Technologies, Professional alerts

本文转载自:http://wikibon.org/wiki/v/MapR_Hadoop_Strategy_Stresses_Performance,_Availability_and_API_Compati...

共有 人打赏支持
我是彩笔
粉丝 7
博文 23
码字总数 1936
作品 0
浦东
私信 提问
MapR 初体验

文章转自:http://www.tbdata.org/archives/1833 一、MapR是什么? MapR是MapR Technologies, Inc的一个产品,号称下一代Hadoop,使Hadoop变为一个速度更快、可靠性更高、更易于管理、使用更...

红薯
2011/08/07
8.1K
1
一个 Hadoop 老兵的自白

Apache Hadoop 是一个免费软件,但实际上,除非是拥有庞大工程师团队的大公司,否则最好不要去创建仅供内部使用的Hadoop版本,因为如果要购买技术支持,那 Hadoop 就不是免费的了。Jim Scott...

oschina
2015/11/28
9.4K
28
Cloudera CDH 、Hortonworks DHP和MapR比较

目前啊,都知道,大数据集群管理方式分为手工方式(Apache hadoop)和工具方式(Ambari + hdp 和Cloudera Manger + CDH)。   手工部署呢,需配置太多参数,但是,好理解其原理,建议初学这...

hblt-j
2018/08/13
0
0
hadoop发行版本之间的区别

Hadoop是一个能够对大量数据进行分布式处理的软件框架。 Hadoop 以一种可靠、高效、可伸缩的方式进行数据处理。Hadoop的发行版除了有Apache hadoop外cloudera,hortonworks,mapR,华为,DKha...

左手的倒影
2018/09/11
0
0
直击Hadoop Summit 2011:迎接海量数据挑战

海量数据正在不断生成,对于急需改变自己传统IT架构的企业而言,面对海量数据,如何分析并有效利用其价值,同时优化企业业务已成为现代企业转型过程中不可避免的问题。 作为海量数据处理的一...

疯狂的流浪
2011/07/01
3.9K
6

没有更多内容

加载失败,请刷新页面

加载更多

linux在线安装JDK(1.8版本)

linux在线安装JDK(1.8版本) 2018年07月03日 15:36:27 唯爱酒神 阅读数:806 标签: linux jdk安装 jdk安装 更多 个人分类: linux 在线下载JDK 命令: wget --no-check-certificate --no-c...

rootliu
36分钟前
1
0
移植Modbus到STM32F103(2):移植FreeModbus到usart3并运行示例代码

FreeModbus是Modbus的一个被广泛移植的实现。其源码在github,最新版是1.6。 FreeModbus支持Modbus功能码里的0x01~0x06,0x0F~0x11和0x17,对一些功能比如异常诊断和读事件计数等功能码并没有...

Konstantine
今天
3
0
浅谈神经网络(神经网络篇)

背景 之前写过浅谈神经网络基础篇,简单介绍下机器学习这块内容,用于扫盲。本文正式将神经网络,这部分是深度学习的基础。了解完可以掌握强大的机器学习的方法,也可以更好的了解深度学习。...

Uknowzheng
今天
5
0
移动硬盘变为RAW格式后的修复

在Mac上使用自己的移动硬盘结果文件系统格式变为RAW; 在自己windows笔记本上使用chkdsk H: /F进行修复,修复日志如下: C:\Users\mengzhang6>chkdsk H: /F文件系统的类型是 NTFS。卷标是 do...

晨猫
今天
6
0
10 Git —— 标签管理

10 Git —— 标签管理 本节内容: 命令git tag <tagname>用于新建一个标签,默认为HEAD,也可以指定一个commit id;命令git tag -a <tagname> -m "blablabla..."可以指定标签信息;命令git......

lwenhao
今天
4
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部