文档章节

http://www.tigase.net/blog-entry/1mln-or-more-onli

今幕明
 今幕明
发布于 2014/11/21 12:06
字数 1013
阅读 157
收藏 1

行业解决方案、产品招募中!想赚钱就来传!>>>

By admin on May 29, 2011    

I have been working on clustering code improvements in the Tigase server for last a few months to make it more reliable and better scale. In article about XMPP Service sharding - Tigase on Intel ATOMs I have presented some preliminary results on a small scale.

In last weeks I had a great opportunity to run several tests over the Tigase cluster of 10 nodes on much better hardware. The goal was to achieve 1mln online users connected to the cluster generating sensible traffic. More tests have been run to see how the cluster behaves with a different number of connections and under a different load.

Below are charts taken from two tests. One test with top 1mln 128k online users and moderate traffic and the second with peak 1mln 685k online users and very reduced traffic.

All tests were carried out until the number of connections reached its maximum and for some time after that to make sure we receive a stable service when connections start dropping.

The test for 1mln online users run with a moderate traffic, that is a message from each online user every 400 seconds and status change every 2800 seconds.

The other test for 1mln 500k online users ran with no additional traffic except user login, roster retrieval, initial presence broadcast and offline presence broadcast on connection close.

The roster size for all online users was 150 elements of which 20% (30) were online during the test and new connections rate was 100/sec.

If you are interested in more details, please continue reading...

I guess the first question which comes to your mind is why so low traffic. Especially looking in presented charts there is for sure room for more.

The CPU would most likely handle more, probably at least as much as twice more traffic and memory usage shouldn't grow much either as traffic generates only temporary objects.

Indeed, the average traffic was estimated to a message every 200 seconds and presence broadcast every 20 mins on each user connection.

The "high" traffic was estimated to a message every 100 seconds and presence broadcast every 10minutes.

Unfortunately as always with load tests, the problem was with generating enough traffic. I used Tsung 1.3 for testing which did really good job simulating user connections from 10 other machines, however it just couldn't do more than that.

Test environment used for tests

I had 21 identical machines at my disposal for duration of the tests: 2 x Xeon Quad 2.0GHz, 16GB RAM, 750GB SATA HDD, 1Gb ethernet.

One machine running Ubuntu Server 9.04 used as a database with MySQL 5.1 installed and tuned for the test.

10 machines running Ubuntu Server 9.04 with Tigase server installed in cluster mode, with Linux kernel and GC settings tuned for the test. Tigase server in version from SVN with some not yet committed changes.

10 machines running Proxmox 1.3 and Debian 5 in virtual machines. Tsung 1.3 on Erlang R13B01 was used as traffic and load generator.

Results

As we can see on attached charts both tests were quite successful.

Of course nobody wants to run a service for 1mln 600k online users with idle connections. The second test was executed only to check the installation limits. As we can see on the memory chart the server completely used up memory. So with 16GB of RAM not much more is possible. Traffic was on quite stable level as it was only generated by new user connections in the first phase, then by both new connections and closing connections in the second phase, hence the CPU load jump, and by closing connections only in third phase.

Much more interesting charts are for the 1mln online users testwith traffic on each connection. We can clearly see "steps" on the cluster traffic chart and less clear steps on the session manager traffic chart. They are related to presences updates "wave" which was starting every 2800 seconds. The CPU usage stayed at about 60% at peak time with plenty of room for more traffic. Memory consumption was quite high at about 70% at peak number of connections.

Other tests

As I mentioned before I have run several tests to see how the server works under a different conditions. There is for sure no room here to present all charts, however I could post them if there is an interest for that. Please send me a message or add comments to the article if you want to see more charts.

The server was tested under different loads:

  1. A message every 100 seconds and presence broadcast every 700 seconds on each connection.

  2. A message every 200 seconds and presence broadcast every 1400 seconds on each connection.

  3. A message every 400 seconds and presence broadcast every 2800 seconds on each connection.

  4. A message every 800 seconds and presence broadcast every 5600 seconds on each connection.

  5. No traffic except packets related to user login, roster retrieval, initial presence broadcast and offline presence broadcast.

Other tests I have run are listed below:

  1. 250k connections over plain TCP with load 1

  2. 250k connections over SSL with load 1

  3. 500k connections over plain TCP with load 1 and 2

  4. 500k connections over SSL with load 1, 2 and 3

  5. 750k connections over plain TCP with load 2 and 3

  6. 1mln connections over plain TCP with load 2 and 3

  7. 1mln 500k connections over plain TCP with load 5

Please note, given max number of connections is a target number, actual tests usually reached more.

Charts

All charts display plots for all 10 cluster nodes with a different colour for each node. In most cases only one plot (blue) is visible as user distribution was very even, hence load was the same. This is especially confusing for connections chart when all 10 plots look like a single blue line.

While chart plots display values for a particular node, the chart title displays sum for all nodes, the max is the maximum total registered by the monitor.


今幕明
粉丝 48
博文 230
码字总数 39411
作品 0
朝阳
程序员
私信 提问
加载中
请先登录后再评论。
Netty那点事(三)Channel与Pipeline

Channel是理解和使用Netty的核心。Channel的涉及内容较多,这里我使用由浅入深的介绍方法。在这篇文章中,我们主要介绍Channel部分中Pipeline实现机制。为了避免枯燥,借用一下《盗梦空间》的...

黄亿华
2013/11/24
2W
22
Flappy Bird(安卓版)逆向分析(一)

更改每过一关的增长分数 反编译的步骤就不介绍了,我们直接来看反编译得到的文件夹 方法1:在smali目录下,我们看到org/andengine/,可以知晓游戏是由andengine引擎开发的。打开/res/raw/at...

enimey
2014/03/04
5.9K
18
我的架构演化笔记 功能1: 基本的用户注册

“咚咚”,一阵急促的敲门声, 我从睡梦中惊醒,我靠,这才几点,谁这么早, 开门一看,原来我的小表弟放暑假了,来南京玩,顺便说跟我后面学习一个网站是怎么做出来的。 于是有了下面的一段...

强子哥哥
2014/05/31
976
3
程序猿媛一:Android滑动翻页+区域点击事件

滑动翻页+区域点击事件 ViewPager+GrideView 声明:博文为原创,文章内容为,效果展示,思路阐述,及代码片段。文尾附注源码获取途径。 转载请保留原文出处“http://my.oschina.net/gluoyer...

花佟林雨月
2013/11/09
4.1K
1
REST/HTTP 工具包--Spray

Spray 是一个开源的 REST/HTTP 工具包和底层网络 IO 包,基于 Scala 和 Akka 构建。轻量级、异步、非堵塞、基于 actor 模式、模块化和可测试是 spray 的特点。 示例代码: val responses: F...

匿名
2013/02/20
7K
0

没有更多内容

加载失败,请刷新页面

加载更多

鼠年吉祥,新年快乐

今天是大年初一,很高兴在过去一年中有您的陪伴,希望大家在新的一年中平安健康,一切顺利,加油。 邓飞 202001250539 于后园爷爷家 本文分享自微信公众号 - 育种数据分析之放飞自我(R-bre...

育种数据分析之放飞自
01/25
0
0
不烧脑、不耗时、全免费,带你0基础学Python

文末有福利 Python是人工智能的未来。 最近,电气和电子工程师协会( IEEE)发布了顶级编程语言交互排行榜:Python高居首位。 而且随着大数据和人工智能的发展,Python受到了越来越多程序员的...

kunjian
今天
0
0
R语言入门系列之一

写在前面 计算机语言的学习并不困难,关键是一定要由浅入深的实际操作练习。也许最开始的比较简单,学习者一带而过没有实际操作,之后的进一步学习很可能会陷入不知所云的困境,实际操作所带...

SYSU星空
2019/02/17
0
0
Istio-本地运行

概述 基于上一篇 Istio1.6-二进制编译和本地运行 但集中在 pilot-discovery 和 envoy(pilot-agent 大部分功能仅作为 envoy 的 watchdog,略过) NOTE: 以下的描述,相对路径都基于目录 /g...

深蓝苹果
31分钟前
9
0
基于Linux、C、JSON、Socket的编程实例(附代码)

点击上方「嵌入式大杂烩」,选择「置顶公众号」第一时间阅读编程笔记! 一、前言 之前在学习socket编程的时候有分享一个基于控制台的简易天气客户端的实现,当时提供的是window下的代码,最近...

学以解忧
2019/10/29
0
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部