文档章节

elasticsearch 的滚动(scroll)

元禛慎独
 元禛慎独
发布于 2017/06/27 11:30
字数 1060
阅读 42
收藏 0

Elasticsearch Reference [2.0] » Search APIs » Request Body Search » Scroll

«  Search Type    Preference  »

Scrolledit

While a search request returns a single “page” of results, the scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use a cursor on a traditional database.

Scrolling is not intended for real time user requests, but rather for processing large amounts of data, e.g. in order to reindex the contents of one index into a new index with a different configuration.

Client support for scrolling and reindexing

Some of the officially supported clients provide helpers to assist with scrolled searches and reindexing of documents from one index to another:

Perl

See Search::Elasticsearch::Bulk and Search::Elasticsearch::Scroll

Python

See elasticsearch.helpers.*

Note

The results that are returned from a scroll request reflect the state of the index at the time that the initial search request was made, like a snapshot in time. Subsequent changes to documents (index, update or delete) will only affect later search requests.

In order to use scrolling, the initial search request should specify the scroll parameter in the query string, which tells Elasticsearch how long it should keep the “search context” alive (see Keeping the search context alive), eg ?scroll=1m.

curl -XGET 'localhost:9200/twitter/tweet/_search?scroll=1m' -d '
{
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
'

The result from the above request includes a _scroll_id, which should be passed to the scroll API in order to retrieve the next batch of results.

curl -XGET  'localhost:9200/_search/scroll'  -d'
{
    "scroll" : "1m", 
    "scroll_id" : "c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1" 
}
'

Note

Added in 2.0.0-beta1.

body based parameters were added in 2.0.0

GET or POST can be used.

The URL should not include the index or type name — these are specified in the originalsearch request instead.

The scroll parameter tells Elasticsearch to keep the search context open for another 1m.

The scroll_id parameter

Each call to the scroll API returns the next batch of results until there are no more results left to return, ie the hits array is empty.

For backwards compatibility, scroll_id and scroll can be passed in the query string. And thescroll_id can be passed in the request body

curl -XGET 'localhost:9200/_search/scroll?scroll=1m' -d 'c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1'

Important

The initial search request and each subsequent scroll request returns a new_scroll_id — only the most recent _scroll_id should be used.

Note

If the request specifies aggregations, only the initial search response will contain the aggregations results.

Efficient scrolling with Scroll-Scanedit

Deep pagination with from and size — e.g. ?size=10&from=10000 — is very inefficient as (in this example) 100,000 sorted results have to be retrieved from each shard and resorted in order to return just 10 results. This process has to be repeated for every page requested.

The scroll API keeps track of which results have already been returned and so is able to return sorted results more efficiently than with deep pagination. However, sorting results (which happens by default) still has a cost.

Normally, you just want to retrieve all results and the order doesn’t matter. Scrolling can be combined with the scan search type to disable any scoring or sorting and to return results in the most efficient way possible. All that is needed is to add search_type=scan to the query string of the initial search request:

curl 'localhost:9200/twitter/tweet/_search?scroll=1m&search_type=scan'  -d '
{
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    }
}
'

Setting search_type to scan disables sorting and makes scrolling very efficient.

A scanning scroll request differs from a standard scroll request in four ways:

  • No score is calculated and sorting is disabled. Results are returned in the order they appear in the index.
  • Aggregations are not supported.
  • The response of the initial search request will not contain any results in the hits array. The first results will be returned by the first scroll request.
  • The size parameter controls the number of results per shard, not per request, so a size of10 which hits 5 shards will return a maximum of 50 results per scroll request.

If you want the scoring to happen, even without sorting on it, set the track_scores parameter totrue.

Keeping the search context aliveedit

The scroll parameter (passed to the search request and to every scroll request) tells Elasticsearch how long it should keep the search context alive. Its value (e.g. 1m, see the section called “Time unitsedit”) does not need to be long enough to process all data — it just needs to be long enough to process the previous batch of results. Each scroll request (with the scroll parameter) sets a new expiry time.

Normally, the background merge process optimizes the index by merging together smaller segments to create new bigger segments, at which time the smaller segments are deleted. This process continues during scrolling, but an open search context prevents the old segments from being deleted while they are still in use. This is how Elasticsearch is able to return the results of the initial search request, regardless of subsequent changes to documents.

Tip

Keeping older segments alive means that more file handles are needed. Ensure that you have configured your nodes to have ample free file handles. See the section called “File Descriptorsedit”.

You can check how many search contexts are open with the nodes stats API:

curl -XGET localhost:9200/_nodes/stats/indices/search?pretty

Clear scroll APIedit

Search context are automatically removed when the scroll timeout has been exceeded. However keeping scrolls open has a cost, as discussed in the previous section so scrolls should be explicitly cleared as soon as the scroll is not being used anymore using the clear-scroll API:

curl -XDELETE localhost:9200/_search/scroll -d '
{
    "scroll_id" : ["c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1"]
}'

Note

Added in 2.0.0-beta1.

Body based parameters were added in 2.0.0

Multiple scroll IDs can be passed as array:

curl -XDELETE localhost:9200/_search/scroll -d '
{
    "scroll_id" : ["c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1", "aGVuRmV0Y2g7NTsxOnkxaDZ"]
}'

Note

Added in 2.0.0-beta1.

Body based parameters were added in 2.0.0

All search contexts can be cleared with the _all parameter:

curl -XDELETE localhost:9200/_search/scroll/_all

The scroll_id can also be passed as a query string parameter or in the request body. Multiple scroll IDs can be passed as comma separated values:

curl -XDELETE localhost:9200/_search/scroll \
     -d 'c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1,aGVuRmV0Y2g7NTsxOnkxaDZ'

«  Search Type  

翻译版详见 http://www.jianshu.com/p/14aa8b09c789

© 著作权归作者所有

元禛慎独
粉丝 3
博文 209
码字总数 60366
作品 0
朝阳
程序员
私信 提问
Elasticsearch——分页查询From&Size VS scroll

Elasticsearch中数据都存储在分片中,当执行搜索时每个分片独立搜索后,数据再经过整合返回。那么,如果要实现分页查询该怎么办呢? 更多内容参考Elasticsearch资料汇总 按照一般的查询流程来...

xiaomin0322
2018/06/13
102
0
Elasticsearch 6.0.0 正式发布,带来大量新特性

在 Elasticsearch 5.0.0 发布之后,Elasticsearch 在333个 commite、2236 个合并请求下,发布了基于 Lucene 7.0.1 的 Elasticsearch 6.0.0 正式版。 Elasticsearch 6.0.0 下载地址 Elastics...

王练
2017/11/15
7.5K
21
Elasticsearch Rest Client bboss v5.6.9 发布

The best Elasticsearch Highlevel Rest Client API-----bboss v5.6.9 发布。 bboss elasticsearch 是一套基 于query dsl 语法操作和访问分布式搜索引擎 elasticsearch 的 o/r mapping 高性能......

bboss
05/13
1K
2
bboss elasticsearch v5.0.6.0 发布

bboss elasticsearch v5.0.6.0 发布 bboss elasticsearch是一款高性能的elasticsearch orm java客户端框架,具备以下主要特性: 简单易用:基于xml配置和管理dsl,在dsl脚本中可以使用变量、...

bboss
2018/04/20
1K
2
Elasticsearch Scroll和Slice Scroll查询API使用案例

Elasticsearch Scroll和Slice Scroll查询API使用案例 the best elasticsearch highlevel java rest api-----bboss 本文内容 基本scroll api使用 基本scroll api与自定义scorll结果集handler......

bboss
2018/09/04
1K
2

没有更多内容

加载失败,请刷新页面

加载更多

SpringBoot中 集成 redisTemplate 对 Redis 的操作(二)

SpringBoot中 集成 redisTemplate 对 Redis 的操作(二) List 类型的操作 1、 向列表左侧添加数据 Long leftPush = redisTemplate.opsForList().leftPush("name", name); 2、 向列表右......

TcWong
今天
7
0
排序––快速排序(二)

根据排序––快速排序(一)的描述,现准备写一个快速排序的主体框架: 1、首先需要设置一个枢轴元素即setPivot(int i); 2、然后需要与枢轴元素进行比较即int comparePivot(int j); 3、最后...

FAT_mt
昨天
4
0
mysql概览

学习知识,首先要有一个总体的认识。以下为mysql概览 1-架构图 2-Detail csdn |简书 | 头条 | SegmentFault 思否 | 掘金 | 开源中国 |

程序员深夜写bug
昨天
10
0
golang微服务框架go-micro 入门笔记2.2 micro工具之微应用利器micro web

micro web micro 功能非常强大,本文将详细阐述micro web 命令行的功能 阅读本文前你可能需要进行如下知识储备 golang分布式微服务框架go-micro 入门笔记1:搭建go-micro环境, golang微服务框架...

非正式解决方案
昨天
9
0
前端——使用base64编码在页面嵌入图片

因为页面中插入一个图片都要写明图片的路径——相对路径或者绝对路径。而除了具体的网站图片的图片地址,如果是在自己电脑文件夹里的图片,当我们的HTML文件在别人电脑上打开的时候图片则由于...

被毒打的程序猿
昨天
9
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部