文档章节

Cassandra调优官方推荐设置

z
 zxpost
发布于 2017/08/21 15:20
字数 1685
阅读 65
收藏 0
点赞 0
评论 0

Recommended production settings 

The following sections provide recommendations for optimizing your DataStax Enterprise installation on Linux:

Use the latest Java Virtual Machine 

Use the latest 64-bit version of Oracle Java Platform, Standard Edition 8 (JDK) or OpenJDK 8.

Synchronize clocks 

Synchronize the clocks on all nodes and application servers. Use NTP (Network Time Protocol) or other methods.

This is required because DataStax Enterprise (DSE) overwrites a column only if there is another version whose timestamp is more recent, which can happen when machines in are different locations.

DSE timestamps are encoded as microseconds since UNIX epoch without timezone information. The timestamp for all writes in DSE is UTC (Universal Time Coordinated). DataStax recommends converting to local time only when generating output to be read by humans.

TCP settings 

To handle thousands of concurrent connections used by DataStax Enterprise, DataStax recommends these settings to optimize the Linux network stack. Add these settings to /etc/sysctl.conf.

net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.core.optmem_max = 40960
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

To set immediately (depending on your distribution):

sudo sysctl -p /etc/sysctl.conf
sudo sysctl -p /etc/sysctl.d/filename.conf

Disable CPU frequency scaling 

Recent Linux systems include a feature called CPU frequency scaling or CPU speed scaling. It allows a server's clock speed to be dynamically adjusted so that the server can run at lower clock speeds when the demand or load is low. This reduces the server's power consumption and heat output (which significantly impacts cooling costs). Unfortunately, this behavior has a detrimental effect on servers running DataStax Enterprise because throughput can get capped at a lower rate.

On most Linux systems, a CPUfreq governor manages the scaling of frequencies based on defined rules and the default ondemand governor switches the clock frequency to maximum when the demand is high and switches to the lowest frequency when the system is idle.

Do not use governors that lower the CPU frequency. To ensure optimal performance, reconfigure all CPUs to use the performance governor, which locks the frequency at maximum. This governor will not switch frequencies, which means there will be no power savings but the servers will always run at maximum throughput. On most systems, set the governor as follows:

for CPUFREQ in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
do
    [ -f $CPUFREQ ] || continue
    echo -n performance > $CPUFREQ
done

For more information, see High server load and latency when CPU frequency scaling is enabled in the DataStax Help Center.

Make sure that new settings persist after reboot 

CAUTION:

Depending on your environment, some of the following settings may not be persisted after reboot. Check with your system administrator to ensure they are viable for your environment.

Optimize SSDs 

The default SSD configurations on most Linux distributions are not optimal. Follow these steps to ensure the best settings for SSDs:

  1. Ensure that the SysFS rotational flag is set to false (zero).

    This overrides any detection by the operating system to ensure the drive is considered an SSD.

  2. Apply the same rotational flag setting for any block devices created from SSD storage, such as mdarrays.
  3. Set the IO scheduler to either deadline or noop:
    • The noop scheduler is the right choice when the target block device is an array of SSDs behind a high-end IO controller that performs IO optimization.
    • The deadline scheduler optimizes requests to minimize IO latency. If in doubt, use the deadline scheduler.
  4. Set the readahead value for the block device to 8 KB.

    This setting tells the operating system not to read extra bytes, which can increase IO time and pollute the cache with bytes that weren’t requested by the user.

    For example, if the SSD is /dev/sda, in /etc/rc.local:

    echo deadline > /sys/block/sda/queue/scheduler
    #OR...
    #echo noop > /sys/block/sda/queue/scheduler
    touch /var/lock/subsys/local
    echo 0 > /sys/class/block/sda/queue/rotational
    echo 8 > /sys/class/block/sda/queue/read_ahead_kb

Use the optimum --setra setting for RAID on SSD 

The optimum readahead setting for RAID on SSDs (in Amazon EC2) is 8KB, the same as it is for non-RAID SSDs. For details, see Optimizing SSDs.

Disable zone_reclaim_mode on NUMA systems 

The Linux kernel can be inconsistent in enabling/disabling zone_reclaim_mode. This can result in odd performance problems

To ensure that zone_reclaim_mode is disabled:

$ echo 0 > /proc/sys/vm/zone_reclaim_mode

For more information, see Peculiar Linux kernel performance problem on NUMA systems.

Set user resource limits 

Use the ulimit -a command to view the current limits. Although limits can also be temporarily set using this command, DataStax recommends making the changes permanent:

Package and Installer-Services installations:

Ensure that the following settings are included in the /etc/security/limits.d/cassandra.conf file:

<cassandra_user> - memlock unlimited
<cassandra_user> - nofile 100000
<cassandra_user> - nproc 32768
<cassandra_user> - as unlimited

Tarball and Installer-No Services installations:

: In RHEL version 6.x, ensure that the following settings are included in the /etc/security/limits.conffile:

<cassandra_user> - memlock unlimited
<cassandra_user> - nofile 100000
<cassandra_user> - nproc 32768
<cassandra_user> - as unlimited

If you run DataStax Enteprise as root, some Linux distributions such as Ubuntu, require setting the limits for root explicitly instead of using cassandra_user:

root - memlock unlimited
root - nofile 100000
root - nproc 32768
root - as unlimited

For RHEL 6.x-based systems, also set the nproc limits in /etc/security/limits.d/90-nproc.conf:

cassandra_user - nproc 32768

For all installations, add the following line to /etc/sysctl.conf:

vm.max_map_count = 1048575

For installations on Debian and Ubuntu operating systems, the pam_limits.so module is not enabled by default. Edit the /etc/pam.d/su file and uncomment this line:

session    required   pam_limits.so

This change to the PAM configuration file ensures that the system reads the files in the/etc/security/limits.d directory.To make the changes take effect, reboot the server or run the following command:

$ sudo sysctl -p

To confirm the limits are applied to the DataStax Enterprise process, run the following command where pid is the process ID of the currently running DataStax Enterprise process:

$ cat /proc/pid/limits

For more information, see Insufficient user resource limits errors.

Disable swap 

Failure to disable swap entirely can severely lower performance. Because the database has multiple replicas and transparent failover, it is preferable for a replica to be killed immediately when memory is low rather than go into swap. This allows traffic to be immediately redirected to a functioning replica instead of continuing to hit the replica that has high latency due to swapping. If your system has a lot of DRAM, swapping still lowers performance significantly because the OS swaps out executable code so that more DRAM is available for caching disks.

If you insist on using swap, you can set vm.swappiness=1. This allows the kernel swap out the absolute least used parts.

$ sudo swapoff --all

To make this change permanent, remove all swap file entries from /etc/fstab.

For more information, see Nodes seem to freeze after some period of time.

Check the Java Hugepages setting

Many modern Linux distributions ship with Transparent Hugepages enabled by default. When Linux uses Transparent Hugepages, the kernel tries to allocate memory in large chunks (usually 2MB), rather than 4K. This can improve performance by reducing the number of pages the CPU must track. However, some applications still allocate memory based on 4K pages. This can cause noticeable performance problems when Linux tries to defrag 2MB pages. For more information, see the Cassandra Java Huge Pages blog and this RedHat bug report.

To solve this problem, disable defrag for hugepages. Enter:

echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag

For more information, including a temporary fix, see No DSE processing but high CPU usage.

Set the heap size for optional Java garbage collection in DataStax Enterprise 

The default JVM garbage collection (GC) for DataStax Enterprise 5.1 is G1.

Note: DataStax does not recommend using G1 when using Java 7. This is due to a problem with class unloading in G1. In Java 7, PermGen fills up indefinitely until a full GC is performed.

Heap size is usually between ¼ and ½ of system memory. Do not devote all memory to heap because it is also used for offheap cache and file system cache.

The easiest way to determine the optimum heap size for your environment is:

  1. Set the MAX_HEAP_SIZE in the cassandra-env.sh file to a high arbitrary value on a single node.
  2. View the heap used by that node:
    • Enable GC logging and check the logs to see trends.
    • Use List view in OpsCenter.
  3. Use the value for setting the heap size in the cluster.

Note: This method decreases performance for the test node, but generally does not significantly reduce cluster performance.

If you don't see improved performance, contact the DataStax Services team for additional help in tuning the JVM.

Determining the heap size when using Concurrent-Mark-Sweep (CMS) garbage collection in DataStax Enterprise 

There are many nuances for tuning CMS. It requires time, expertise, and repeated testing to get the best results. DataStax recommends contacting the DataStax Services team instead. Tuning Java resourcesprovides the basic information to get you started.

Set the heap size for optimal Java garbage collection 

See Tuning Java resources.

Apply optimum blockdev --setra settings for RAID on spinning disks 

Typically, a readahead of 128 is recommended.

Check to ensure setra is not set to 65536:

sudo blockdev --report /dev/spinning_disk

To set setra:

sudo blockdev --setra 128 /dev/spinning_disk

Note: The recommended setting for RAID on SSDs is the same as that for SSDs that are not being used in a RAID installation. For details, see Optimizing SSDs.

本文转载自:http://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configRecommendedSettings.h

共有 人打赏支持
z
粉丝 1
博文 409
码字总数 21359
作品 0
成都
cassandra集群性能优化

2008年开源的项目,官方文档至今没有完善。。。花几十美刀买的文档才足够详细,也是醉了。 硬件环境优化 commit logs、data files放在不同的disk上,如果在一起会导致操作倍阻塞。他们的I/O...

Jun_Wong
01/09
0
0
Cassandra NoSQL数据模型设计指南

原文:Cassandra NoSQL Data Model Design 翻译:雁惊寒 摘要:本文通过一个简单的实例详细介绍了Cassandra数据建模的五个步骤。以下是译文。 我们最近在Instaclustr发表了一篇有关在Cassand...

dev_csdn
2017/11/21
0
0
YCSB初体验之cassandra

刚接触了ycsb,这个确实比较新,基本没有找到除官方发布的信息以外的资料,国内的就更不用指望了. 简单介绍下: ycsb yahoo! cloud serving system benchmark. 显然,这是yahoo!应"云"而生的评价...

rhein
2010/08/26
0
0
spark程序优化总结

转行写spark程序快一年时间了,我最深刻的体会是实现功能容易,但如何提高程序的执行效率却是个难题。我们用的spark主要是spark sql框架,使用spark sql实现数据的清洗、抽取以及计算。期间,...

ZPPenny
2017/05/04
0
0
腾讯云CDB的AI技术实践:CDBTune

欢迎大家前往腾讯云+社区,获取更多腾讯海量技术实践干货哦~ 作者:邢家树,高级工程师,目前就职于腾讯TEG基础架构部数据库团队。腾讯数据库技术团队维护MySQL内核分支TXSQL,100%兼容原生M...

腾讯云加社区
06/05
0
0
DBA要失业了?看ML如何自动优化数据库

译者介绍 杨志洪,DBAplus社群联合发起人,新炬网络首席布道师。Oracle ACE、OCM、《Oracle核心技术》译者。数据管理专家,拥有十余年电信、银行、保险等大型行业核心系统Oracle数据库运维支...

杨志洪 译
2017/06/04
0
0
PHP 性能分析第三篇: 性能调优实战

注意:本文是我们的 PHP 性能分析系列的第三篇,点此阅读 PHP 性能分析第一篇: XHProf & XHGui 介绍 ,或 PHP 性能分析第二篇: 深入研究 XHGui 。 在本系列的 第一篇 中,我们介绍了 XHProf 。...

OneAPM蓝海讯通
2015/10/23
24
0
基于APR模式的Tomcat环境部署

1、版本信息 Tomcat的通讯模型总共为3种,分别为BIO、NIO、APR,而本次所采用的模式为APR。 3种模式的区别: 2、基础安装 # vim /etc/profile # vim /opt/tomcat/conf/server.xml # vim /op...

workming
2015/11/05
0
0
Java GC 专家系列3:GC调优实践

本篇是”GC专家系列“的第三篇。在第一篇理解Java垃圾回收中我们学习了几种不同的GC算法的处理过程,GC的工作方式,新生代与老年代的区别。所以,你应该已经了解了JDK 7中的5种GC类型,以及每...

umgsai
2016/09/08
0
0
性能调优概述

大纲: 一、概述 二、什么是性能调优?(what) 三、为什么需要性能调优?(why) 四、什么时候需要性能调优?(when) 五、什么地方需要性能调优?(where) 六、什么人来进行性能调优?(who) 七、怎...

陈明乾
2014/07/14
0
0

没有更多内容

加载失败,请刷新页面

加载更多

下一页

Spring基础

Spring是什么? Spring是一个开源框架,最早由Rod Johnson创建,它解决的是业务逻辑层和其他各层的松耦合问题。 经过十几年的发展,Spring正在扩展其他的领域,如:移动开发、社交API集成、N...

这很耳东先生
9分钟前
0
0
面试系列-40个Java多线程问题总结

前言 这篇文章主要是对多线程的问题进行总结的,因此罗列了40个多线程的问题。 这些多线程的问题,有些来源于各大网站、有些来源于自己的思考。可能有些问题网上有、可能有些问题对应的答案也...

Ryan-瑞恩
23分钟前
0
0
微信分享的细节

分享的缩略图要求: 一、图片大小小于32k 二、图片的尺寸为 宽度 :128px 高度:128px 分享title 和 description 出现金额等 以上情况存在会导致触发分享按钮 但是页面没有反应...

Js_Mei
28分钟前
0
0
【2018.07.23学习笔记】【linux高级知识 Shell脚本编程练习】

1、编写shell脚本,计算1-100的和; #!/bin/bashsum=0for i in `seq 1 100`do sum=$[$sum+$i]doneecho $sum 2、编写shell脚本,要求输入一个数字,然后计算出从1到输入数字的和,要求...

lgsxp
31分钟前
0
0
xss攻防浅谈

导读 XSS (Cross-Site Script) 攻击又叫跨站脚本攻击, 本质是一种注入攻击. 其原理, 简单的说就是利用各种手段把恶意代码添加到网页中, 并让受害者执行这段脚本. XSS能做用户使用浏览器能做的...

吴伟祥
31分钟前
0
0
js回调的一次应用

function hideBtn(option) { if (option == 1) { $("#addBtn").hide(); $("#addSonBtn").hide(); }}$("body").on("click", "#selectBtn", function () {......

晨猫
37分钟前
0
0
C++_读写ini配置文件

1.WritePrivateProfileString:写配置文件 WritePrivateProfileString 函数的定义形式为: BOOL WritePrivateProfileString(LPCTSTR lpAppName, LPCTSTR lpKeyName, LPCTSTR lpString, LPCTST......

一个小妞
38分钟前
0
0
通往阿里,BAT的50+经典Java面试题及答案解析(上)

Java是一个支持并发、基于类和面向对象的计算机编程语言。下面列出了面向对象软件开发的优点: 代码开发模块化,更易维护和修改。 代码复用。 增强代码的可靠性和灵活性。 增加代码的可理解性...

Java大蜗牛
38分钟前
1
0
数据库两大神器【索引和锁】

前言 只有光头才能变强 索引和锁在数据库中可以说是非常重要的知识点了,在面试中也会经常会被问到的。 本文力求简单讲清每个知识点,希望大家看完能有所收获 声明:如果没有说明具体的数据库...

Java3y
41分钟前
0
0
Application Express安装

Application Express安装文档 数据库选择和安装 数据库选择 Oracle建议直接12.2.0.1.0及以上的版本,12.1存在20618595bug(具体可参见官方文档) Oracle 12c 中安装oracle application expr...

youfen
54分钟前
0
0

没有更多内容

加载失败,请刷新页面

加载更多

下一页

返回顶部
顶部