文档章节

Is Hadoop secure for the enterprise?

Yulong_
 Yulong_
发布于 2018/11/28 20:47
字数 795
阅读 16
收藏 0

转自 <https://www.xplenty.com/blog/is-hadoop-secure-for-the-enterprise/>

Is Hadoop secure for the enterprise? This is the question that data analysts must answer if they want to bring Hadoop to large organisations.

While Hadoop has proved its power for scalable storage and processing of Big Data, it may not be enterprise-ready when it comes to security. Hortonworks, Cloudera and MapR address this problem by providing Enterprise Hadoop distributions. There are also several Hadoop security projects, such as Apache Argus and Knox. But what does Hadoop provide right out of the box?

The bad news is that a fresh Hadoop installation isn’t secure—it wasn’t even made to be secure. Hadoop’s main purpose was to allow processing data that comes in large volume, variety and velocity, while allowing everyone to access the data and run jobs. However, things changed and security features were added to later Hadoop versions. There are at least four areas of concern for Hadoop’s security: authentication, authorisation, auditing and encryption.

Hadoop Authentication

No one wants anonymous users to browse through their data. That’s why Hadoop supports Kerberos: a mature authentication protocol that has been around since the late eighties.

Nonetheless, to get Kerberos up and running for Hadoop, sysadmins need to install, configure and maintain a Kerberos server. If the organisation runs some other kind of centralized authentication server, then this doubles the amount of work. Not to mention that Kerberos is well known as a nightmare to maintain.

Hadoop also supports HTTP simple authentication for its web consoles. This authentication method sends passwords in plaintext and for each HTTP request. Even if you use SSL to hide it while in transit, passwords may still be logged by the server and cached in the browser. That’s not secure enough.

Hadoop Authorization

Although Hadoop was founded on the democratic principles of open access to data for all, this isn’t right for the enterprise. Organisations need strict control over who can access which data and what they can do with it.

Fortunately, HDFS supports authorization via the traditional file permission model as well as ACLs (Access Control Lists). Therefore, Hadoop makes it possible to control access to files and directories for users and groups.

What about control over who can send jobs to the cluster? For that Hadoop provides Service Level Authorization: a mechanism that makes sure that clients who use a certain Hadoop service have the right permissions.

Hadoop authorization is pretty tight so far and it can be tighter: HDFS, Oozie, YARN and any other Hadoop processes should be executed by hardened users. If one of the systems is attacked, whether from inside or outside of the organisation, the attacker won’t be able to harm the machine or disrupt any of the other processes due to limited permissions.

Hadoop Auditing

Any secure system must include auditing: the ability to monitor and report changes in the system. In the case of Hadoop, there should be monitoring who accessed which data and when, what jobs they executed, what settings they changed, etc.

Hadoop and its related components do offer built-in audit logging. However, they still have a long way to go. There’s no unified or even consistent audit format, which makes log analysis really difficult. Intel’s Project Rhino, a general Hadoop security initiative, will attempt to build tools that transform audit logs into a standard format. Until then, auditing is available but it’s not easy.

Hadoop Encryption

Part of data protection is making sure that the data becomes useless if stolen—whether physically or by a man-in-the-middle attack. Data encryption is the obvious solution.

Fortunately, RPC data—the data transferred between Hadoop services and clients—and block data transfer between nodes can be encrypted. Even connections to Hadoop’s web console can be encrypted.

Well, what about the data itself? Sadly HDFS doesn’t support local data encryption just yet. Once again, Project Rhino comes to the rescue by promising to develop an encryption and key management framework for Hadoop. They’re still working on it.

Summary

Hadoop isn’t secure for the enterprise right out of the box. Nonetheless, it comes with several built-in security features such as Kerberos authentication, HDFS file permissions, Service Level Authorization, audit logging and network encryption. These need to be set up and configured by a sysadmin.

Organizations who need stronger security will probably opt for a Hadoop distribution by Hortonworks, Cloudera, or MapR. These distributions include extra security measures as well as integration with Apache Hadoop security projects, thus making it safe to let the elephant in.

 

 

© 著作权归作者所有

上一篇: 团队升级
下一篇: IntelliJ IDEA快捷键
Yulong_
粉丝 10
博文 145
码字总数 253510
作品 0
朝阳
部门经理
私信 提问
hadoop集群之间的hdfs文件拷贝

1、背景 部门有个需求,在网络互通的情况下,把现有的hadoop集群(未做Kerberos认证,集群名为:bd-stg-hadoop)的一些hdfs文件拷贝到新的hadoop集群(做了Kerberos认证,集群名为zp-tt-had...

PeanutLike
2016/09/27
776
0
Securing Mule Applications With Anypoint Enterprise Security

Mule provides a bundle of security tools called Anypoint Enterprise Security which helps in securing data access in a Mule Application. Anypoint Enterprise Security requires an ......

Danish Sheikh
2017/12/21
0
0
CentOS7-Hadoop安装

环境:CentOS7 64位 2台 centos7-1 192.168.190.130 master centos7-2 192.168.190.129 slave1 hadoop 下载地址:http://mirrors.hust.edu.cn/apache/hadoop/common/ 选则3.1.0版本 在安装之......

刺激乐天派
2018/07/12
0
0
ios应用在企业内部分发遇到到问题(OAuth2.0授权码模式)

问题 最近遇到在企业内部分发iOS软件的时候,plist文件和ipa文件都不能下载安装的问题。安装苹果的官方文档:《通过网页服务器分发企业内部应用》。搭建了让企业内部员工下载的iOS安装包的服...

亚林瓜子
07/04
17
0
Wandisco推出Hadoop Console集中管理及部署平台

2013年2月19日,大数据及Hadoop开发者Wandisco公司推出Hadoop Console集中管理及部署平台Wandisco Hadoop Console(WHC),借助WHC,Hadoop用户可以简便快捷地进行Hadoop的部署和管理,而不需要掌...

wisper
2013/02/21
1K
2

没有更多内容

加载失败,请刷新页面

加载更多

云栖干货回顾 | 更强大的实时数仓构建能力!分析型数据库PostgreSQL 6.0新特性解读

阿里云 AnalyticDB for PostgreSQL 为采用MPP架构的分布式集群数据库,完备支持SQL 2003,部分兼容Oracle语法,支持PL/SQL存储过程,触发器,支持标准数据库事务ACID。AnalyticDB PG通过行存...

大涛学弟
24分钟前
4
0
TL138/1808/6748-EasyEVM开发板硬件CPU、FLASH、RAM

TL138/1808/6748-EasyEVM是广州创龙基于SOM-TL138/SOM-TL1808/SOM-TL6748核心板开发的一款开发板。由于SOM-TL138/SOM-TL1808/SOM-TL6748核心板管脚兼容,所以此三个核心板共用同一个底板。开...

Tronlong创龙
28分钟前
3
0
开普勒平台开源版

https://github.com/kplcloud/kplcloud

perofu
32分钟前
4
0
昨天,这项阿里技术再获世界级科技大奖!

第六届世界互联网大会来了!千年水乡古镇乌镇又一次吸引了全世界的目光。 昨天,阿里云自研数据库POLARDB 在会上当选世界互联网领先科技成果。POLARDB解决了企业在云时代的数据库难题,帮助企...

阿里云官方博客
32分钟前
4
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部