文档章节

Es中的fielddata(译文,粗略)

水东流
 水东流
发布于 2015/11/08 19:24
字数 898
阅读 93
收藏 0

In order for aggregations (or any operation that requires access to field values) to be fast, access to fielddata must be fast, which is why it is loaded into memory. But loading too much data into memory will cause slow garbage collections as the JVM tries to find extra space in the heap, or possibly even an OutOfMemory exception.

It may surprise you to find that Elasticsearch does not load into fielddata just the values for the documents that match your query. It loads the values for all documents in your index, even documents with a different _type !

The logic is: if you need access to documents X, Y, and Z for this query, you will probably need access to other documents in the next query. It is cheaper to load all values once, and to keep them in memory, than to have to scan the inverted index on every request.



Fielddata Size

The indices.fielddata.cache.size controls how much heap space is allocated to fielddata. When you run a query that requires access to new field values, it will load the values into memory and then try to add them to fielddata. If the resulting fielddata size would exceed the specified size, other values would be evicted in order to make space.

By default, this setting is unbounded—Elasticsearch will never evict data from fielddata 坑爹啊,默认只进不出,我擦.

This default was chosen deliberately: fielddata is not a transient cache (可以理解,不是瞬间的缓存). It is an in-memory data structure that must be accessible for fast execution, and it is expensive to build. If you have to reload data for every request, performance is going to be awful.


A bounded size forces the data structure to evict data. We will look at when to set this value, but first a warning:

应该设置一个值,让fielddata自动evict,但是有个如下的警告

This setting is a safeguard, not a solution for insufficient memory.

If you don’t have enough memory to keep your fielddata resident in memory, Elasticsearch will constantly have to reload data from disk, and evict other data to make space. Evictions cause heavy disk I/O and generate a large amount of garbage in memory, which must be garbage collected later on. 内存大小,如果使用evict,将会不断的进行移除数据,再load数据,io压力大,内存中的垃圾过多。

Imagine that you are indexing logs, using a new index every day. Normally you are interested in data from only the last day or two. Although you keep older indices around, you seldom need to query them. However, with the default settings, the fielddata from the old indices is never evicted! fielddata will just keep on growing until you trip the fielddata circuit breaker (see Circuit Breaker), which will prevent you from loading any more fielddata. 

看来如果不设置这个size,那么就一直往fielddata中放数据,当达到breaker设置的阈值的时候,异常产生了,并且这种默认的情况下,es是不进行fielddata evict的。

如果我们设置了size,那么fielddata达到了size的时候,就会自动evict,那么可以说,这个size应该小于breaker设置的阈值了。


At that point, you’re stuck. While you can still run queries that access fielddata from the old indices, you can’t load any new values. Instead, we should evict old values to make space for the new values.

To prevent this scenario, place an upper limit on the fielddata by adding this setting to theconfig/elasticsearch.yml file:

indices.fielddata.cache.size: 40% 

Can be set to a percentage of the heap size, or a concrete value like 5gb

indices.fielddata.cache.expire可以有设置超期的时间,不过就不要使用了,不建议。


In Fielddata Size, we spoke about adding a limit to the size of fielddata, to ensure that old unused fielddata can be evicted. The relationship between indices.fielddata.cache.size and indices.breaker.fielddata.limit is an important one. If the circuit-breaker limit is lower than the cache size, no data will ever be evicted. In order for it to work properly, the circuit breaker limit must be higher than the cache size.

Monitoring fielddata

It is important to keep a close watch on how much memory is being used by fielddata, and whether any data is being evicted. High eviction counts can indicate a serious resource issue and a reason for poor performance.

Fielddata usage can be monitored:

  • per-index using the indices-stats API:

    GET /_stats/fielddata?fields=*
  • per-node using the nodes-stats API:

    GET /_nodes/stats/indices/fielddata?fields=*
  • Or even per-index per-node:
GET /_nodes/stats/indices/fielddata?level=indices&fields=*

By setting ?fields=*, the memory usage is broken down for each field.


就算设置了size,但是需要load的进入的数据,还是超过了breaker的设置,咋办?




© 著作权归作者所有

水东流
粉丝 4
博文 51
码字总数 23858
作品 0
海淀
程序员
私信 提问
Elasticsearch 1.4 升级 2.3.4

es1.x到es2.x有很大的差别,具体看https://www.elastic.co/guide/en/elasticsearch/reference/2.3/breaking-changes-2.0.html。下面罗列一些主要的变化点。 1._id path废弃: 以前这种写法,...

markeloff
2016/08/26
158
0
日均5亿查询量的京东订单中心,为什么舍MySQL用ES?

作者介绍 张sir,京东到家研发工程师,主要负责订单中心、商家中心、计费等系统。 京东到家订单中心系统业务中,无论是外部商家的订单生产,或是内部上下游系统的依赖,订单查询的调用量都非...

张sir
2018/12/27
0
0
ElasticSearch的缓存清除策略

前面提及了字段过滤缓存,那么与之相反的清楚缓存策略 单一索引缓存,多索引缓存和全部缓存的清理 1.清空全部缓存 curl localhost:9200/cache/clear?pretty { "shards" : { "total" : 72, "s......

键走偏锋
2017/11/09
2K
0
Elasticsearch性能优化

image 欢迎访问我的博客查看原文:http://wangnan.tech 注:文本整理自《ELKstack权威指南》 目录 批量提交 gateway 集群状态维护 缓存 字段数据 curator profiler 批量提交 在 CRUD 章节,我...

wanna
2017/11/27
0
0
Elasticsearch 三种缓存介绍

转自:http://blog.csdn.net/chennanymy/article/details/52504386?locationNum=3 Filter Cache(Query Cache): https://www.elastic.co/guide/en/elasticsearch/reference/1.6/index-modu......

晴天哥
2017/12/07
0
0

没有更多内容

加载失败,请刷新页面

加载更多

java通过ServerSocket与Socket实现通信

首先说一下ServerSocket与Socket. 1.ServerSocket ServerSocket是用来监听客户端Socket连接的类,如果没有连接会一直处于等待状态. ServetSocket有三个构造方法: (1) ServerSocket(int port);...

Blueeeeeee
今天
6
0
用 Sphinx 搭建博客时,如何自定义插件?

之前有不少同学看过我的个人博客(http://python-online.cn),也根据我写的教程完成了自己个人站点的搭建。 点此:使用 Python 30分钟 教你快速搭建一个博客 为防有的同学不清楚 Sphinx ,这...

王炳明
昨天
5
0
黑客之道-40本书籍助你快速入门黑客技术免费下载

场景 黑客是一个中文词语,皆源自英文hacker,随着灰鸽子的出现,灰鸽子成为了很多假借黑客名义控制他人电脑的黑客技术,于是出现了“骇客”与"黑客"分家。2012年电影频道节目中心出品的电影...

badaoliumang
昨天
16
0
很遗憾,没有一篇文章能讲清楚线程的生命周期!

(手机横屏看源码更方便) 注:java源码分析部分如无特殊说明均基于 java8 版本。 简介 大家都知道线程是有生命周期,但是彤哥可以认真负责地告诉你网上几乎没有一篇文章讲得是完全正确的。 ...

彤哥读源码
昨天
18
0
jquery--DOM操作基础

本文转载于:专业的前端网站➭jquery--DOM操作基础 元素的访问 元素属性操作 获取:attr(name);$("#my").attr("src"); 设置:attr(name,value);$("#myImg").attr("src","images/1.jpg"); ......

前端老手
昨天
7
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部