发布于 2014/11/21 12:06
字数 1013
阅读 139
收藏 1
点赞 0
评论 0

By admin on May 29, 2011    

I have been working on clustering code improvements in the Tigase server for last a few months to make it more reliable and better scale. In article about XMPP Service sharding - Tigase on Intel ATOMs I have presented some preliminary results on a small scale.

In last weeks I had a great opportunity to run several tests over the Tigase cluster of 10 nodes on much better hardware. The goal was to achieve 1mln online users connected to the cluster generating sensible traffic. More tests have been run to see how the cluster behaves with a different number of connections and under a different load.

Below are charts taken from two tests. One test with top 1mln 128k online users and moderate traffic and the second with peak 1mln 685k online users and very reduced traffic.

All tests were carried out until the number of connections reached its maximum and for some time after that to make sure we receive a stable service when connections start dropping.

The test for 1mln online users run with a moderate traffic, that is a message from each online user every 400 seconds and status change every 2800 seconds.

The other test for 1mln 500k online users ran with no additional traffic except user login, roster retrieval, initial presence broadcast and offline presence broadcast on connection close.

The roster size for all online users was 150 elements of which 20% (30) were online during the test and new connections rate was 100/sec.

If you are interested in more details, please continue reading...

I guess the first question which comes to your mind is why so low traffic. Especially looking in presented charts there is for sure room for more.

The CPU would most likely handle more, probably at least as much as twice more traffic and memory usage shouldn't grow much either as traffic generates only temporary objects.

Indeed, the average traffic was estimated to a message every 200 seconds and presence broadcast every 20 mins on each user connection.

The "high" traffic was estimated to a message every 100 seconds and presence broadcast every 10minutes.

Unfortunately as always with load tests, the problem was with generating enough traffic. I used Tsung 1.3 for testing which did really good job simulating user connections from 10 other machines, however it just couldn't do more than that.

Test environment used for tests

I had 21 identical machines at my disposal for duration of the tests: 2 x Xeon Quad 2.0GHz, 16GB RAM, 750GB SATA HDD, 1Gb ethernet.

One machine running Ubuntu Server 9.04 used as a database with MySQL 5.1 installed and tuned for the test.

10 machines running Ubuntu Server 9.04 with Tigase server installed in cluster mode, with Linux kernel and GC settings tuned for the test. Tigase server in version from SVN with some not yet committed changes.

10 machines running Proxmox 1.3 and Debian 5 in virtual machines. Tsung 1.3 on Erlang R13B01 was used as traffic and load generator.


As we can see on attached charts both tests were quite successful.

Of course nobody wants to run a service for 1mln 600k online users with idle connections. The second test was executed only to check the installation limits. As we can see on the memory chart the server completely used up memory. So with 16GB of RAM not much more is possible. Traffic was on quite stable level as it was only generated by new user connections in the first phase, then by both new connections and closing connections in the second phase, hence the CPU load jump, and by closing connections only in third phase.

Much more interesting charts are for the 1mln online users testwith traffic on each connection. We can clearly see "steps" on the cluster traffic chart and less clear steps on the session manager traffic chart. They are related to presences updates "wave" which was starting every 2800 seconds. The CPU usage stayed at about 60% at peak time with plenty of room for more traffic. Memory consumption was quite high at about 70% at peak number of connections.

Other tests

As I mentioned before I have run several tests to see how the server works under a different conditions. There is for sure no room here to present all charts, however I could post them if there is an interest for that. Please send me a message or add comments to the article if you want to see more charts.

The server was tested under different loads:

  1. A message every 100 seconds and presence broadcast every 700 seconds on each connection.

  2. A message every 200 seconds and presence broadcast every 1400 seconds on each connection.

  3. A message every 400 seconds and presence broadcast every 2800 seconds on each connection.

  4. A message every 800 seconds and presence broadcast every 5600 seconds on each connection.

  5. No traffic except packets related to user login, roster retrieval, initial presence broadcast and offline presence broadcast.

Other tests I have run are listed below:

  1. 250k connections over plain TCP with load 1

  2. 250k connections over SSL with load 1

  3. 500k connections over plain TCP with load 1 and 2

  4. 500k connections over SSL with load 1, 2 and 3

  5. 750k connections over plain TCP with load 2 and 3

  6. 1mln connections over plain TCP with load 2 and 3

  7. 1mln 500k connections over plain TCP with load 5

Please note, given max number of connections is a target number, actual tests usually reached more.


All charts display plots for all 10 cluster nodes with a different colour for each node. In most cases only one plot (blue) is visible as user distribution was very even, hence load was the same. This is especially confusing for connections chart when all 10 plots look like a single blue line.

While chart plots display values for a particular node, the chart title displays sum for all nodes, the max is the maximum total registered by the monitor.

© 著作权归作者所有

共有 人打赏支持
粉丝 45
博文 224
码字总数 39350
作品 0

参考资料 1 xStream框架完美实现Java对象和xml文档JSON、XML相互转换 http://www.cnblogs.com/hoojo/archive/2011/04/22/2025197.html 2 xStream完美转换XML、JSON http://archive.cnblogs.c......

yuanyuan_186 ⋅ 2015/11/27 ⋅ 0


ACMer博客瀑布流是一个专门收集ACMer博客并展示的站点。地址http://blog.acmicpc.info/ 打开网页之后直接查看源代码发现 很明显,网页中的html代码都是由这个函数来生成的。再搜索一下源代码...

ismdeep ⋅ 2016/04/17 ⋅ 0

about foreignKey

from django.db import models class Blog(models.Model): name = models.CharField(max_length=100) tagline = models.TextField() def str(self): return self.name class Author(models.M......

charlesdong1989 ⋅ 2012/08/02 ⋅ 0

【JDK7】新特性(6) 监听文件系统的更改


12qw90op ⋅ 2012/12/11 ⋅ 0

浅析tornado 中demo的 blog模块

#!/usr/bin/env python # # Copyright 2009 Facebook # # Licensed under the Apache License, Version 2.0 (the "License"); you may # not use this file except in compliance with the L......

沉淀岁月 ⋅ 2016/09/06 ⋅ 0

Tigase Load Tests again - 500k user connections

By admin on May 29, 2010 I have had a great opportunity and pleasure to use Sun's environment both hardware and software to run load tests on the Tigase server for a couple of l......

今幕明 ⋅ 2014/11/21 ⋅ 0

Bootstrap 3.3.6 发布,Web 前端 UI 框架

Bootstrap 3.3.6发布,更新内容如下: CSS #16492: Use variables on input groups. #16529: Un-style the caret on s in IE10+. #16562, #16563: Create and implement a new variable to ......

淡漠悠然 ⋅ 2015/11/25 ⋅ 18


一、艾洛积分系统(Elo Ranking System) 请参考 https://blog.csdn.net/haishu_zheng/article/details/80480284 二、Codeforces积分系统 类似于艾洛积分系统,但是具体算法没公布。 详情请参考...

海天一树X ⋅ 05/28 ⋅ 0

glassfish 集群

glassFish有三种集群结构 A:----------------------DAS--> 使用DAS来控制集群中的节点 创建节点. window可以用DCOM链接,或者安装Cygwin or MKS. window下需要注意权限问题. 采用一种http服务...

不道归来 ⋅ 2014/09/04 ⋅ 0


solrCloud的主要功能: 主要功能包括强大的全文搜索,点击显示,面搜索,动态聚类,数据库集成,丰富的文件(如Word,PDF)处理,和空间搜索,而且他具有高度的可扩展性,提供容错的分布式搜索...

力谱宿云 ⋅ 2016/07/19 ⋅ 0





spring Email

一、普通邮件 maven依赖 <dependency> <groupId>org.springframework</groupId> <artifactId>spring-context-support</artifactId> <version>4.2.6.RELEASE</version>......

BobwithB ⋅ 12分钟前 ⋅ 0

spark 整理的一些知识

Spark 知识点 请描述spark RDD原理与特征? RDD全称是resilient distributed dataset(具有弹性的分布式数据集)。一个RDD仅仅是一个分布式的元素集合。在Spark中,所有工作都表示为创建新的...

tuoleisi77 ⋅ 15分钟前 ⋅ 0


时间一天天过感觉自己有在成长吗?最怕的是时光匆匆而过,自己没有收获!下面总结下最近自己的思考。 认识自己 认识另一个自己,人们常说要虚心听取别人意见和建议。然而人往往是很难做到的,...

hello_hp ⋅ 16分钟前 ⋅ 0



原创小博客 ⋅ 35分钟前 ⋅ 0


一、Centos 6版本解决办法: 1.任意运行一条iptables防火墙规则配置命令: iptables -P OUTPUT ACCEPT 2.对iptables服务进行保存: service iptables save 3.重启iptables服务: service ...

寰宇01 ⋅ 45分钟前 ⋅ 2


备份:mysqldump -u root -p 数据库>磁盘路径 恢复:mysql -u root -p 数据库<sql脚本的磁盘路径

anlve ⋅ 今天 ⋅ 0

发生了什么?Linus 又发怒了?

在一个 Linux 内核 4.18-rc1 的 Pull Request 中,开发者 Andy Shevchenko 表示其在对设备属性框架进行更新时,移除了 union 别名,这引发了 Linus 的暴怒。 这一次 Linus Torvalds 发怒的原...

问题终结者 ⋅ 今天 ⋅ 0


在树莓派上搭建一个maven仓库 20180618 lambo init 项目说明 家里有台树莓派性能太慢。想搭建一个maven私服, 使用nexus或者 jfrog-artifactory 运行的够呛。怎么办呢,手写一个吧.所在这个...

林小宝 ⋅ 今天 ⋅ 0


转自与 https://www.cnblogs.com/RunForLove/p/4641672.html 目前很多公司的架构,从Struts2迁移到了SpringMVC。你有想过为什么不使用Servlet+JSP来构建Java web项目,而是采用SpringMVC呢?...

onedotdot ⋅ 今天 ⋅ 0