文档章节

Hadoop CDH4.5 监控部署--ganglia

China_OS
 China_OS
发布于 2014/05/23 12:21
字数 2069
阅读 194
收藏 1
点赞 0
评论 0

       Hadoop集群的监控CDH有个自带的界面可以查看,开源软件里面和hadoop接口做的比较好的就是ganglia了,官方也是建议试用ganglia监控hadoop。下面我们在现有的hadoop集群上部署ganglia监控。还好很久之前就接触维护过ganglia了,部署起来也不是很麻烦,关于ganglia的基础只是请自行google。

ganglia集群部署信息如下

192.168.1.10    U-1    ganglia-monitor    ganglia-webfrontend
192.168.1.20    U-2    ganglia-monitor
192.168.1.30    U-3    ganglia-monitor
192.168.1.40    U-4    ganglia-monitor
192.168.1.50    U-5    ganglia-monitor
192.168.1.70    U-7    ganglia-monitor

ganglia-webfrontend    是ganglia的前端界面

ganglia-monitor    是ganglia的数据收集客户端

1    在各个节点安装数据收集客户端(U-1/2/3/4/5)

apt-get install ganglia-monitor

2    配置ganglia-monitor(U-1/2/3/4/5)

        客户端配置文件使/etc/ganglia/gmond.conf

/* This configuration is as close to 2.5.x default behavior as possible 
   The values closely match ./gmond/metric.h definitions in 2.5.x */ 
globals {                    
  daemonize = yes              
  setuid = yes             
  user = root              
  debug_level = 0               
  max_udp_msg_len = 1472        
  mute = no             
  deaf = no             
  host_dmax = 0 /*secs */ 
  cleanup_threshold = 300 /*secs */ 
  gexec = no             
  send_metadata_interval = 0     
} 

/* If a cluster attribute is specified, then all gmond hosts are wrapped inside 
 * of a <CLUSTER> tag.  If you do not specify a cluster tag, then all <HOSTS> will 
 * NOT be wrapped inside of a <CLUSTER> tag. */ 
cluster { 
  name = "Hadoop-CDH4.5" 
  owner = "root" 
  latlong = "unspecified" 
  url = "unspecified" 
} 

/* The host section describes attributes of the host, like the location */ 
host { 
  location = "192.168.1.30" 
} 

/* Feel free to specify as many udp_send_channels as you like.  Gmond 
   used to only support having a single channel */ 
udp_send_channel { 
  host = 192.168.1.10 
  port = 8649 
  ttl = 1 
} 

/* You can specify as many udp_recv_channels as you like as well. */ 
udp_recv_channel { 
  port = 8649  
} 

/* You can specify as many tcp_accept_channels as you like to share 
   an xml description of the state of the cluster */ 
tcp_accept_channel { 
  port = 8649 
} 

/* Each metrics module that is referenced by gmond must be specified and 
   loaded. If the module has been statically linked with gmond, it does not 
   require a load path. However all dynamically loadable modules must include 
   a load path. */ 
modules { 
  module { 
    name = "core_metrics" 
  } 
  module { 
    name = "cpu_module" 
    path = "/usr/lib/ganglia/modcpu.so" 
  } 
  module { 
    name = "disk_module" 
    path = "/usr/lib/ganglia/moddisk.so" 
  } 
  module { 
    name = "load_module" 
    path = "/usr/lib/ganglia/modload.so" 
  } 
  module { 
    name = "mem_module" 
    path = "/usr/lib/ganglia/modmem.so" 
  } 
  module { 
    name = "net_module" 
    path = "/usr/lib/ganglia/modnet.so" 
  } 
  module { 
    name = "proc_module" 
    path = "/usr/lib/ganglia/modproc.so" 
  } 
  module { 
    name = "sys_module" 
    path = "/usr/lib/ganglia/modsys.so" 
  } 
} 

include ('/etc/ganglia/conf.d/*.conf') 


/* The old internal 2.5.x metric array has been replaced by the following 
   collection_group directives.  What follows is the default behavior for 
   collecting and sending metrics that is as close to 2.5.x behavior as 
   possible. */

/* This collection group will cause a heartbeat (or beacon) to be sent every 
   20 seconds.  In the heartbeat is the GMOND_STARTED data which expresses 
   the age of the running gmond. */ 
collection_group { 
  collect_once = yes 
  time_threshold = 20 
  metric { 
    name = "heartbeat" 
  } 
} 

/* This collection group will send general info about this host every 1200 secs. 
   This information doesn't change between reboots and is only collected once. */ 
collection_group { 
  collect_once = yes 
  time_threshold = 1200 
  metric { 
    name = "cpu_num" 
    title = "CPU Count" 
  } 
  metric { 
    name = "cpu_speed" 
    title = "CPU Speed" 
  } 
  metric { 
    name = "mem_total" 
    title = "Memory Total" 
  } 
  /* Should this be here? Swap can be added/removed between reboots. */ 
  metric { 
    name = "swap_total" 
    title = "Swap Space Total" 
  } 
  metric { 
    name = "boottime" 
    title = "Last Boot Time" 
  } 
  metric { 
    name = "machine_type" 
    title = "Machine Type" 
  } 
  metric { 
    name = "os_name" 
    title = "Operating System" 
  } 
  metric { 
    name = "os_release" 
    title = "Operating System Release" 
  } 
  metric { 
    name = "location" 
    title = "Location" 
  } 
} 

/* This collection group will send the status of gexecd for this host every 300 secs */
/* Unlike 2.5.x the default behavior is to report gexecd OFF.  */ 
collection_group { 
  collect_once = yes 
  time_threshold = 300 
  metric { 
    name = "gexec" 
    title = "Gexec Status" 
  } 
} 

/* This collection group will collect the CPU status info every 20 secs. 
   The time threshold is set to 90 seconds.  In honesty, this time_threshold could be 
   set significantly higher to reduce unneccessary network chatter. */ 
collection_group { 
  collect_every = 20 
  time_threshold = 90 
  /* CPU status */ 
  metric { 
    name = "cpu_user"  
    value_threshold = "1.0" 
    title = "CPU User" 
  } 
  metric { 
    name = "cpu_system"   
    value_threshold = "1.0" 
    title = "CPU System" 
  } 
  metric { 
    name = "cpu_idle"  
    value_threshold = "5.0" 
    title = "CPU Idle" 
  } 
  metric { 
    name = "cpu_nice"  
    value_threshold = "1.0" 
    title = "CPU Nice" 
  } 
  metric { 
    name = "cpu_aidle" 
    value_threshold = "5.0" 
    title = "CPU aidle" 
  } 
  metric { 
    name = "cpu_wio" 
    value_threshold = "1.0" 
    title = "CPU wio" 
  } 
  /* The next two metrics are optional if you want more detail... 
     ... since they are accounted for in cpu_system.  
  metric { 
    name = "cpu_intr" 
    value_threshold = "1.0" 
    title = "CPU intr" 
  } 
  metric { 
    name = "cpu_sintr" 
    value_threshold = "1.0" 
    title = "CPU sintr" 
  } 
  */ 
} 

collection_group { 
  collect_every = 20 
  time_threshold = 90 
  /* Load Averages */ 
  metric { 
    name = "load_one" 
    value_threshold = "1.0" 
    title = "One Minute Load Average" 
  } 
  metric { 
    name = "load_five" 
    value_threshold = "1.0" 
    title = "Five Minute Load Average" 
  } 
  metric { 
    name = "load_fifteen" 
    value_threshold = "1.0" 
    title = "Fifteen Minute Load Average" 
  }
} 

/* This group collects the number of running and total processes */ 
collection_group { 
  collect_every = 80 
  time_threshold = 950 
  metric { 
    name = "proc_run" 
    value_threshold = "1.0" 
    title = "Total Running Processes" 
  } 
  metric { 
    name = "proc_total" 
    value_threshold = "1.0" 
    title = "Total Processes" 
  } 
}

/* This collection group grabs the volatile memory metrics every 40 secs and 
   sends them at least every 180 secs.  This time_threshold can be increased 
   significantly to reduce unneeded network traffic. */ 
collection_group { 
  collect_every = 40 
  time_threshold = 180 
  metric { 
    name = "mem_free" 
    value_threshold = "1024.0" 
    title = "Free Memory" 
  } 
  metric { 
    name = "mem_shared" 
    value_threshold = "1024.0" 
    title = "Shared Memory" 
  } 
  metric { 
    name = "mem_buffers" 
    value_threshold = "1024.0" 
    title = "Memory Buffers" 
  } 
  metric { 
    name = "mem_cached" 
    value_threshold = "1024.0" 
    title = "Cached Memory" 
  } 
  metric { 
    name = "swap_free" 
    value_threshold = "1024.0" 
    title = "Free Swap Space" 
  } 
} 

collection_group { 
  collect_every = 40 
  time_threshold = 300 
  metric { 
    name = "bytes_out" 
    value_threshold = 4096 
    title = "Bytes Sent" 
  } 
  metric { 
    name = "bytes_in" 
    value_threshold = 4096 
    title = "Bytes Received" 
  } 
  metric { 
    name = "pkts_in" 
    value_threshold = 256 
    title = "Packets Received" 
  } 
  metric { 
    name = "pkts_out" 
    value_threshold = 256 
    title = "Packets Sent" 
  } 
}

/* Different than 2.5.x default since the old config made no sense */ 
collection_group { 
  collect_every = 1800 
  time_threshold = 3600 
  metric { 
    name = "disk_total" 
    value_threshold = 1.0 
    title = "Total Disk Space" 
  } 
}

collection_group { 
  collect_every = 40 
  time_threshold = 180 
  metric { 
    name = "disk_free" 
    value_threshold = 1.0 
    title = "Disk Space Available" 
  } 
  metric { 
    name = "part_max_used" 
    value_threshold = 1.0 
    title = "Maximum Disk Space Used" 
  } 
}
3    安装ganglia-webfrontend(U-1)

apt-get install ganglia-webfrontend

4    配置gmetad.conf

        配置文件在/etc/ganglia/gmetad.conf

# This is an example of a Ganglia Meta Daemon configuration file
#                http://ganglia.sourceforge.net/
#
# $Id: gmetad.conf.in 2014 2009-08-10 10:44:09Z d_pocock $
#
#-------------------------------------------------------------------------------
# Setting the debug_level to 1 will keep daemon in the forground and
# show only error messages. Setting this value higher than 1 will make 
# gmetad output debugging information and stay in the foreground.
# default: 0
# debug_level 10
#
#-------------------------------------------------------------------------------
# What to monitor. The most important section of this file. 
#
# The data_source tag specifies either a cluster or a grid to
# monitor. If we detect the source is a cluster, we will maintain a complete
# set of RRD databases for it, which can be used to create historical 
# graphs of the metrics. If the source is a grid (it comes from another gmetad),
# we will only maintain summary RRDs for it.
#
# Format: 
# data_source "my cluster" [polling interval] address1:port addreses2:port ...
# 
# The keyword 'data_source' must immediately be followed by a unique
# string which identifies the source, then an optional polling interval in 
# seconds. The source will be polled at this interval on average. 
# If the polling interval is omitted, 15sec is asssumed. 
#
# A list of machines which service the data source follows, in the 
# format ip:port, or name:port. If a port is not specified then 8649
# (the default gmond port) is assumed.
# default: There is no default value
#
# data_source "my cluster" 10 localhost  my.machine.edu:8649  1.2.3.5:8655
# data_source "my grid" 50 1.3.4.7:8655 grid.org:8651 grid-backup.org:8651
# data_source "another source" 1.3.4.7:8655  1.3.4.8

data_source "Hadoop-CDH4.6" 192.168.1.10

#
# Round-Robin Archives
# You can specify custom Round-Robin archives here (defaults are listed below)
#
# RRAs "RRA:AVERAGE:0.5:1:244" "RRA:AVERAGE:0.5:24:244" "RRA:AVERAGE:0.5:168:244" "RRA:AVERAGE:0.5:672:244" \
#      "RRA:AVERAGE:0.5:5760:374"
#

#
#-------------------------------------------------------------------------------
# Scalability mode. If on, we summarize over downstream grids, and respect
# authority tags. If off, we take on 2.5.0-era behavior: we do not wrap our output
# in <GRID></GRID> tags, we ignore all <GRID> tags we see, and always assume
# we are the "authority" on data source feeds. This approach does not scale to
# large groups of clusters, but is provided for backwards compatibility.
# default: on
# scalable off
#
#-------------------------------------------------------------------------------
# The name of this Grid. All the data sources above will be wrapped in a GRID
# tag with this name.
# default: unspecified
# gridname "MyGrid"
#
#-------------------------------------------------------------------------------
# The authority URL for this grid. Used by other gmetads to locate graphs
# for our data sources. Generally points to a ganglia/
# website on this machine.
# default: "http://hostname/ganglia/",
#   where hostname is the name of this machine, as defined by gethostname().
# authority "http://mycluster.org/newprefix/"
#
#-------------------------------------------------------------------------------
# List of machines this gmetad will share XML with. Localhost
# is always trusted. 
# default: There is no default value
# trusted_hosts 127.0.0.1 169.229.50.165 my.gmetad.org
#
#-------------------------------------------------------------------------------
# If you want any host which connects to the gmetad XML to receive
# data, then set this value to "on"
# default: off
# all_trusted on
#
#-------------------------------------------------------------------------------
# If you don't want gmetad to setuid then set this to off
# default: on
# setuid off
#
#-------------------------------------------------------------------------------
# User gmetad will setuid to (defaults to "nobody")
# default: "nobody"
# setuid_username "nobody"
#
#-------------------------------------------------------------------------------
# The port gmetad will answer requests for XML
# default: 8651
# xml_port 8651
#
#-------------------------------------------------------------------------------
# The port gmetad will answer queries for XML. This facility allows
# simple subtree and summation views of the XML tree.
# default: 8652
# interactive_port 8652
#
#-------------------------------------------------------------------------------
# The number of threads answering XML requests
# default: 4
# server_threads 10
#
#-------------------------------------------------------------------------------
# Where gmetad stores its round-robin databases
# default: "/var/lib/ganglia/rrds"
# rrd_rootdir "/some/other/place"
#
#-------------------------------------------------------------------------------
# In earlier versions of gmetad, hostnames were handled in a case
# sensitive manner
# If your hostname directories have been renamed to lower case,
# set this option to 0 to disable backward compatibility.
# From version 3.2, backwards compatibility will be disabled by default.
# default: 1   (for gmetad < 3.2)
# default: 0   (for gmetad >= 3.2)
case_sensitive_hostnames 1
5    配置apache2

        ganglia-webfrontend默认安装完毕后的web路径在/usr/share/ganglia-webfrontend目录下,而apache2的默认的web路径在/var/www,所以需要在/var/www目录下做个软连接

cd /var/www
ln -s ganglia /usr/share/ganglia-webfrontend
6    在浏览器输入192.168.1.10/ganglia查看效果


7    这个界面有点搓啊,ganglia新版的界面已经很不错了,我们把自带的3.1.7版本的ganglia-web替换成3.5.12版本

        在官网下载ganglia-web-3.5.12.tar.gz版本到本地,你也可以下载最新版的。

8    安装ganglia-web-3.5.12.tar.gz

tar zxvf ganglia-web-3.5.12.tar.gz
cd ganglia-web-3.5.12
          1    为了后面部署方便修改Makefile文件,只要修改下面几个参数即可

#GDESTDIR = /usr/share/ganglia-webfrontend
GDESTDIR = /var/www/ganglia2

# Location where default apache configuration should be installed to.
GCONFDIR = /etc/apache2/sites-enabled/ganglia-web

# Gweb statedir (where conf dir and Dwoo templates dir are stored)
GWEB_STATEDIR = /var/lib/ganglia-web

# Gmetad rootdir (parent location of rrd folder)
GMETAD_ROOTDIR = /var/lib/ganglia

APACHE_USER = www-data
          2    安装

make install

        3    这时在/var/www目录下面会有一个ganglia2目录,在浏览器输入192.168.1.10/ganglia2查看效果图


9    关于ganglia的定制

        上面只是基础的安装,你肯定有一些自己要定义的元素显示在web界面上,都可以通过修改gmetad.conf来实现,ganglia也有类似zabbix proxy的机制,就是一组节点把数据发送到一个机器,然后由该机器把数据发送到总节点上,这样可以减轻总节点的链接负载,也方便扩展。由于我的hadoop集群比较小,还暂时用不到该模式。




© 著作权归作者所有

共有 人打赏支持
China_OS
粉丝 403
博文 438
码字总数 487778
作品 0
徐汇
技术主管
Hadoop CDH4.5升级CDH5 以及NameNode和YARN HA实战

CDH5支持很多新特性,所以打算把当前的CDH4.5升级到CDH5,软件部署还是以之前的CDH4.5集群为基础 192.168.1.10 U-1 (Active) hadoop-yarn-resourcemanager hadoop-hdfs-namenode hadoop-mapr...

China_OS
2014/05/27
0
0
Hadoop (CDH4发行版)集群部署

前言 折腾了一段时间hadoop的部署管理,写下此系列博客记录一下。 为了避免各位做部署这种重复性的劳动,我已经把部署的步骤写成脚本,各位只需要按着本文把脚本执行完,整个环境基本就部署完...

snakelxc
2013/07/10
0
19
在本地CentOs伪分布式hadoop监控,Linux无法正常启动

在本地CentOs伪分布式hadoop部署了ganglia监控。等到下次重新启动时候(我的是虚拟机环境Vmware),发现无法正常启动界面。但是通过putty等客户端可以登录操作。然后其他的可以在命令行正常操...

欧阳天涯
2013/10/12
219
0
hadoop集群监控工具--ganglia的搭建(YUM的方式)

Ganglia架构简介: Ganglia 是一款为HPC(高性能计算) 集群设计的可扩展性的分布式监控系统,它可以监视和显示集群中节点的各种状态信息,它由运行在各个节点上的守护进程gmond 采集 CPU、内存...

dear_csdn
2017/08/01
0
0
配置hadoop的ganglia监控,为什么datanode节点的监控没有??

在hadoop集群中配置了ganglia ,但是监控页面却没有数据节点(datanode)的监控情况,是不是datanode的服务器也要配置ganglia??因为现在就只有namenode配置了ganglia,而datanode却没有。 我的g...

黑帽子
2014/06/27
1K
1
Linux下的监控器之一Ganglia详解与部署

Ganglia基础详解 Ganglia介绍 Ganglia是一个跨平台可扩展的,高性能计算系统下的分布式监控系统,如集群和网格。它是基于分层设计,它使用广泛的技术,如XML数据代表,便携数据传输,RRDtool...

Insane_linux
2017/08/10
0
0
使用Ambari快速部署Hadoop大数据环境

做大数据相关的后端开发工作一年多来,随着Hadoop社区的不断发展,也在不断尝试新的东西,本文着重来讲解下Ambari,这个新的Apache的项目,旨在让大家能够方便快速的配置和部署Hadoop生态圈相关的...

cnxk
2013/12/05
0
0
云监控 Ganglia 安装步骤 (含python module)

前言 最近在研究云监控的相关工具,感觉ganglia颇有亮点,能从一个集群整体的角度来展现数据. 但是安装过程稍过复杂,相关依赖稍多,故写此文章与大家分享下. 本文不讲解相关原理,若想了解请参考...

一只小逛
2013/12/04
11.9K
4
ganglia监控hadoop 容器节点

hadoop容器运行参考上篇博客 http://blog.csdn.net/wenwenxiong/article/details/78973755 参看网址: https://gist.github.com/ameizi/0c77e3dbb13ded779347 在namenode容器节点上安装gangl......

wenwenxiong
01/04
0
0
基于Ganglia实现集群性能态势感知

回顾 通过前面的发布过的两篇文章,我们已经大致掌握了描述单个服务器的性能情况的方法。可以从load avgerage等总括性的数据着手,获得系统资源利用率(CPU、内存、I/O、网络)和进程运行情况...

RiboseYim
2016/11/04
202
1

没有更多内容

加载失败,请刷新页面

加载更多

下一页

百度云iOS架构师在职场中的忠告

1.工具不能代替思考 在我多年的咨询工作和与许多组织和管理者的共事中,我发现了修复问题的共同套路,那就是管理人员相信工具可以“解决”给出的问题。当问题域被理解透彻,并且不可能有很多...

_小迷糊
20分钟前
0
0
Java基础——异常

声明:本栏目所使用的素材都是凯哥学堂VIP学员所写,学员有权匿名,对文章有最终解释权;凯哥学堂旨在促进VIP学员互相学习的基础上公开笔记。 异常处理: 可以挖很多个陷阱,但是不要都是一样...

凯哥学堂
33分钟前
0
0
180723-Quick-Task 动态脚本支持框架之结构设计篇

文章链接:https://liuyueyi.github.io/hexblog/2018/07/23/180723-Quick-Task-动态脚本支持框架之结构设计篇/ Quick-Task 动态脚本支持框架之结构设计篇 相关博文: 180702-QuickTask动态脚本...

小灰灰Blog
36分钟前
0
0
SBT 常用开发技巧

SBT 一直以来都是 Scala 开发者不可言说的痛,最主要的原因就是官方文档维护质量较差,没有经过系统的、循序渐进式的整理,导致初学者入门门槛较高。虽然也有其它构建工具可以选择(例如 Mill...

joymufeng
41分钟前
0
0
HBase in Practice - 性能、监控及问题解决

李钰(社区ID:Yu Li),阿里巴巴计算平台事业部高级技术专家,HBase开源社区PMC&committer。开源技术爱好者,主要关注分布式系统设计、大数据基础平台建设等领域。连续4年基于HBase/HDFS设计和...

中国HBase技术社区
42分钟前
1
0
ES18-JAVA API 批量操作

1.批量查询 Multi Get API public static void multiGet() {// 批量查询MultiGetResponse response = getClient().prepareMultiGet().add("my_person", "my_index", "1")// 查......

贾峰uk
46分钟前
0
0
SpringBoot2.0使用health

1,引入actuator <dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-actuator</artifactId></dependency> 2,application.properties ......

暗中观察
53分钟前
0
0
阿里巴巴Java开发规约

###编程规约 命名风格 【强制】代码中的命名均不能以下划线或美元符号开始,也不能以下划线或美元符号结束 【强制】代码中的命名严禁使用拼音与英文混合的方式,更不允许直接使用中文的方式。...

简心
58分钟前
0
0
如何用TypeScript来创建一个简单的Web应用

转载地址 如何用TypeScript来创建一个简单的Web应用 安装TypeScript 获取TypeScript工具的方式: 通过npm(Node.js包管理器) npm install -g typescript 构建你的第一个TypeScript文件 创建...

durban
今天
0
0
分享好友,朋友圈自定义分享链接无效

这个问题是微信6.5.6版本以后,修改了分享规则:分享的连接必须在公众号后台设定的js安全域名内

LM_Mike
今天
0
0

没有更多内容

加载失败,请刷新页面

加载更多

下一页

返回顶部
顶部