文档章节

ES12-词项查询

贾峰uk
 贾峰uk
发布于 07/18 16:21
字数 2203
阅读 6
收藏 0

1.词项查询介绍

全文查询将在执行之前分析查询字符串,但词项级别查询将按照存储在倒排索引中的词项进行精确操作。这些查询通常用于数字,日期和枚举等结构化数据,而不是全文本字段。 或者,它们允许您制作低级查询,并在分析过程之前进行。

2.term查询

term查询用于词项搜索,前一章已经介绍过这里不再重复。

3.terms查询

term查询对于查找单个值非常有用,但通常我们可能想搜索多个值。我们只要用单个 terms 查询(注意末尾的 s ), terms 查询好比是 term 查询的复数形式(以英语名词的单复数做比)。

如下查询”title“中包含”河北“,”长生“,”碧桂园“三个词组。

GET telegraph/_search
{
  "query": {
    "terms": {
      "title": ["河北","长生","碧桂园"]
    }
  }
}
{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "A5etp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "碧桂园集团副主席杨惠妍",
          "content": "杨惠妍分别于7月10日、11日买入碧桂园1000万股、1500万股",
          "author": "小财注",
          "pubdate": "2018-07-17T16:12:55"
        }
      },
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "Apetp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "长生生物再次跌停 三机构抛售近1000万元",
          "content": "长生生物再次一字跌停,报收19.89元,成交1432万元",
          "author": "长生生物",
          "pubdate": "2018-07-17T10:03:11"
        }
      },
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "BJetp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "河北聚焦十大行业推进国际产能合作",
          "content": "河北省政府近日出台积极参与“一带一路”建设推进国际产能合作实施方案",
          "author": "财联社",
          "pubdate": "2018-07-17T14:14:55"
        }
      }
    ]
  }
}

4. terms_set查询

查找与一个或多个指定词项匹配的文档,其中必须匹配的术语数量取决于指定的最小值,应匹配字段或脚本。

5.range查询

range查询用于匹配数值型、日期型或字符串型字段在某一范围内的文档。

日期类型范围查询

上面例子查询发布时间“pubdate”在“2018-07-17T12:00:00”和“2018-07-17T16:30:00”之间的文档数据。

GET telegraph/_search
{
  "query": {
    "range": {
      "pubdate": {
        "gte": "2018-07-17T12:00:00",
        "lte": "2018-07-17T16:30:00"
      }
    }
  }
}

查询结果

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "AZetp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "周五召开董事会会议 审议及批准更新后的一季报",
          "content": "以审议及批准更新后的2018年第一季度报告",
          "author": "中兴通讯",
          "pubdate": "2018-07-17T12:33:11"
        }
      },
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "A5etp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "碧桂园集团副主席杨惠妍",
          "content": "杨惠妍分别于7月10日、11日买入碧桂园1000万股、1500万股",
          "author": "小财注",
          "pubdate": "2018-07-17T16:12:55"
        }
      },
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "BJetp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "河北聚焦十大行业推进国际产能合作",
          "content": "河北省政府近日出台积极参与“一带一路”建设推进国际产能合作实施方案",
          "author": "财联社",
          "pubdate": "2018-07-17T14:14:55"
        }
      }
    ]
  }
}

数值类型范围查询

新建索引添加数据

DELETE my_person

PUT my_person

PUT my_person/stu/1
{
  "name":"sean",
  "age":20
}

PUT my_person/stu/2
{
  "name":"sum",
  "age":25
}

PUT  my_person/stu/3
{
  "name":"dean",
  "age":30
}

PUT my_person/stu/4
{
  "name":"kastel",
  "age":35
}

查询“age”范围在20到30之间的人员

GET my_person/_search
{
  "query": {
    "range": {
      "age": {
        "gte": 20,
        "lte": 30
      }
    }
  }
}

查询结果

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "sum",
          "age": 25
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "sean",
          "age": 20
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "dean",
          "age": 30
        }
      }
    ]
  }
}

6.exists查询

查询文档中的字段至少包含一个非空值。

创建索引添加数据

DELETE my_person

PUT my_person

PUT my_person/stu/1
{
  "name":"sean",
  "hobby":"running"
}

PUT my_person/stu/2
{
  "name":"Jhon",
  "hobby":""
}

PUT my_person/stu/3
{
  "name":"sum",
  "hobby":["swimming",null]
}

PUT my_person/stu/4
{
  "name":"lily",
  "hobby":[null,null]
}

PUT my_person/stu/5
{
  "name":"lucy"
}

查询“hobby”不为空的文档

GET my_person/_search
{
  "query": {
    "exists":{
      "field":"hobby"
    }
  }
}

查询结果

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "Jhon",
          "hobby": ""
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "sean",
          "hobby": "running"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "sum",
          "hobby": [
            "swimming",
            null
          ]
        }
      }
    ]
  }
}

匹配说明:

  • "hobby":"running"------值不为空(可以匹配)
  • "hobby":""------值为空字符串,不是空值(可以匹配)
  • "hobby":["swimming",null]------数组中有非空值(可以匹配)
  • "hobby":[null,null]------数组中值都为null(不可以匹配)
  • "name":"lucy"------没有hobby字段(不可以匹配)

7.prefix查询

查询以匹配字符串开头的文档,如下查询”hobby“中以”sw“开头的文档

GET my_person/_search
{
  "query": {
    "prefix": {
      "hobby": {
        "value": "sw"
      }
    }
  }
}

查询结果

{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "6",
        "_score": 1,
        "_source": {
          "name": "deak",
          "hobby": "swimming"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "sum",
          "hobby": [
            "swimming",
            null
          ]
        }
      }
    ]
  }
}

8.wildcard查询

通配符查询,如下查询hobby匹配”*ing“的文档

GET my_person/_search
{
  "query": {
    "wildcard": {
      "hobby": {
        "value": "*ing"
      }
    }
  }
}

查询结果

{
  "took": 27,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "6",
        "_score": 1,
        "_source": {
          "name": "deak",
          "hobby": "swimming"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "sean",
          "hobby": "running"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "sum",
          "hobby": [
            "swimming",
            null
          ]
        }
      }
    ]
  }
}

9.regexp查询

正则表达式查询的性能很大程度上取决于所选的正则表达式。 类似.*的匹配任何内容的正则表达式非常缓慢,并且使用了lookaround正则表达式。 如果可以的话,请尝试在正则表达式开始之前使用长前缀。 像.*?+这样的通配符匹配器大多会降低性能。大多数正则表达式引擎允许您匹配字符串的任何部分。 如果你想让正则表达式模式从字符串的开头开始,或者在字符串的末尾完成,那么你必须明确地定位它,使用^表示开始或$表示结束。

元字符 语义 说明 例子
. Match any character The period “.” can be used to represent any character
匹配任何一个字符
ab.匹配abc、ab1
+ One-or-more The plus sign “+” can be used to repeat the preceding shortest pattern once or more times.
加号“+”可以用来重复上一个最短的模式一次或多次。
“aaabbb”匹配a+b+
* Zero-or-more The asterisk “*” can be used to match the preceding shortest pattern zero-or-more times. “aaabbb”匹配a*b*
? Zero-or-one The question mark “?” makes the preceding shortest pattern optional. It matches zero or one times. “aaabbb”匹配aaa?bbbb?
{m},{m,n} Min-to-max Curly brackets “{}” can be used to specify a minimum and (optionally) a maximum number of times the preceding shortest pattern can repeat. “aaabbb”匹配a{3}b{3}和a{2,4}b{2,4}
() Grouping Parentheses “()” can be used to form sub-patterns. “ababab”匹配(ab)+
| Alternation The pipe symbol “|” acts as an OR operator. “aabb”匹配aabb|bbaa
[] Character classes Ranges of potential characters may be represented as character classes by enclosing them in square brackets “[]”. A leading ^ negates the character class. [abc]匹配 ‘a’ or ‘b’ or ‘c’
~ Complement The shortest pattern that follows a tilde “~” is negated(否定).“ab~cd”的意思是:以a开头,后跟b,后面跟一个任意长度的字符串,但不是c,以d结尾 “abcdef”匹配ab~df或a~(cb)def,不匹配ab~cdef和a~(bc)def
<> Interval间隔 The interval option enables the use of numeric ranges, enclosed by angle brackets “<>”. “foo80”匹配foo<1-100>
& Intersection The ampersand “&” joins two patterns in a way that both of them have to match. “aaabbb”匹配aaa.+&.+bbb
@ Any string The at sign “@” matches any string in its entirety. @&~(foo.+)匹配除了以“foo”开头的字符串 “foo”

查询”hobby“字段值与”sw.+“正则匹配的文档

GET my_person/_search
{
  "query": {
    "regexp":{
      "hobby":"sw.+"
    }
  }
}

查询结果

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "6",
        "_score": 1,
        "_source": {
          "name": "deak",
          "hobby": "swimming"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "sum",
          "hobby": [
            "swimming",
            null
          ]
        }
      }
    ]
  }
}

10.fuzzy查询

模糊查询

GET telegraph/_search
{
  "query": {
    "fuzzy": {
      "title": "十大"
    }
  }
}

查询结果

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.99277425,
    "hits": [
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "BJetp2QBW8hrYY3zGJk7",
        "_score": 0.99277425,
        "_source": {
          "title": "河北聚焦十大行业推进国际产能合作",
          "content": "河北省政府近日出台积极参与“一带一路”建设推进国际产能合作实施方案",
          "author": "财联社",
          "pubdate": "2018-07-17T14:14:55"
        }
      }
    ]
  }
}

11.ids查询

根据跟定的文档id列表查询文档。

GET my_person/_search
{
  "query": {
    "ids": {
      "values": ["1","3","5"]
    }
  }
}

查询结果

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "5",
        "_score": 1,
        "_source": {
          "name": "lucy"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "sean",
          "hobby": "running"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "sum",
          "hobby": [
            "swimming",
            null
          ]
        }
      }
    ]
  }
}

 

© 著作权归作者所有

共有 人打赏支持
贾峰uk
粉丝 0
博文 76
码字总数 111991
作品 0
深圳
《深入理解Elasticsearch(原书第2版)》一第2章 查询DSL进阶

第2章 查询DSL进阶 在上一章,我们了解了什么是Apache Lucene,它的整体架构,以及文本分析过程是如何完成的。之后,我们还介绍了Lucene的查询语言及其用法。除此之外,我们也讨论了Elastic...

哲别0
2017/11/01
0
0
ElasticSearch5.x实践_day05_04_Mapping参数

三、Mapping参数 3.1 analyzer 指定分词器(分析器更合理),对索引和查询都有效。如下,指定ik分词的配置: 3.2 normalizer normalizer用于解析前的标准化配置,比如把所有的字符转化为小写等...

键走偏锋
2017/07/29
0
0
《深入理解Elasticsearch(原书第2版)》——第1章  Elasticsearch简介

第1章 Elasticsearch简介 摘要: 欢迎来到Elasticsearch的世界并阅读本书第2版。通过阅读本书,我们将带领你接触与Elasticsearch紧密相关的各种话题。请注意,本书不是为初学者写的。笔者将本...

哲别0
2017/11/01
0
0
Elasticsearch: 权威指南 » 聚合 » Doc Values and Fielddata » Doc Values

聚合使用一个叫 doc values 的数据结构(在 Doc Values 介绍 里简单介绍)。 Doc values 可以使聚合更快、更高效并且内存友好,所以理解它的工作方式十分有益。 Doc values 的存在是因为倒排...

xiaomin0322
03/28
0
0
经典检索算法:BM25原理

image.png 本文cmd地址:经典检索算法:BM25原理 bm25 是什么? bm25 是一种用来评价搜索词和文档之间相关性的算法,它是一种基于概率检索模型提出的算法,再用简单的话来描述下bm25算法:我...

超级个体颛顼
2017/12/06
0
0

没有更多内容

加载失败,请刷新页面

加载更多

下一页

qduoj~前端~二次开发~打包docker镜像并上传到阿里云容器镜像仓库

上一篇文章https://my.oschina.net/finchxu/blog/1930017记录了怎么在本地修改前端,现在我要把我的修改添加到部署到本地的前端的docker容器中,然后打包这个容器成为一个本地镜像,然后把这...

虚拟世界的懒猫
40分钟前
1
0
UML中 的各种符号含义

Class Notation A class notation consists of three parts: Class Name The name of the class appears in the first partition. Class Attributes Attributes are shown in the second par......

hutaishi
51分钟前
0
0
20180818 上课截图

小丑鱼00
今天
1
0
Springsecurity之SecurityContextHolderStrategy

注:下面分析的版本是spring-security-4.2.x,源码的github地址是: https://github.com/spring-projects/spring-security/tree/4.2.x 先上一张图: 图1 SecurityContextHolderStrategy的三个......

汉斯-冯-拉特
今天
0
0
LNMP架构(Nginx负载均衡、ssl原理、生成ssl密钥对、Nginx配置ssl)

Nginx负载均衡 网站的访问量越来越大,服务器的服务模式也得进行相应的升级,比如分离出数据库服务器、分离出图片作为单独服务,这些是简单的数据的负载均衡,将压力分散到不同的机器上。有时...

蛋黄_Yolks
今天
0
0

没有更多内容

加载失败,请刷新页面

加载更多

下一页

返回顶部
顶部