文档章节

ES12-词项查询

贾峰uk
 贾峰uk
发布于 07/18 16:21
字数 2203
阅读 6
收藏 0

1.词项查询介绍

全文查询将在执行之前分析查询字符串,但词项级别查询将按照存储在倒排索引中的词项进行精确操作。这些查询通常用于数字,日期和枚举等结构化数据,而不是全文本字段。 或者,它们允许您制作低级查询,并在分析过程之前进行。

2.term查询

term查询用于词项搜索,前一章已经介绍过这里不再重复。

3.terms查询

term查询对于查找单个值非常有用,但通常我们可能想搜索多个值。我们只要用单个 terms 查询(注意末尾的 s ), terms 查询好比是 term 查询的复数形式(以英语名词的单复数做比)。

如下查询”title“中包含”河北“,”长生“,”碧桂园“三个词组。

GET telegraph/_search
{
  "query": {
    "terms": {
      "title": ["河北","长生","碧桂园"]
    }
  }
}
{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "A5etp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "碧桂园集团副主席杨惠妍",
          "content": "杨惠妍分别于7月10日、11日买入碧桂园1000万股、1500万股",
          "author": "小财注",
          "pubdate": "2018-07-17T16:12:55"
        }
      },
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "Apetp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "长生生物再次跌停 三机构抛售近1000万元",
          "content": "长生生物再次一字跌停,报收19.89元,成交1432万元",
          "author": "长生生物",
          "pubdate": "2018-07-17T10:03:11"
        }
      },
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "BJetp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "河北聚焦十大行业推进国际产能合作",
          "content": "河北省政府近日出台积极参与“一带一路”建设推进国际产能合作实施方案",
          "author": "财联社",
          "pubdate": "2018-07-17T14:14:55"
        }
      }
    ]
  }
}

4. terms_set查询

查找与一个或多个指定词项匹配的文档,其中必须匹配的术语数量取决于指定的最小值,应匹配字段或脚本。

5.range查询

range查询用于匹配数值型、日期型或字符串型字段在某一范围内的文档。

日期类型范围查询

上面例子查询发布时间“pubdate”在“2018-07-17T12:00:00”和“2018-07-17T16:30:00”之间的文档数据。

GET telegraph/_search
{
  "query": {
    "range": {
      "pubdate": {
        "gte": "2018-07-17T12:00:00",
        "lte": "2018-07-17T16:30:00"
      }
    }
  }
}

查询结果

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "AZetp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "周五召开董事会会议 审议及批准更新后的一季报",
          "content": "以审议及批准更新后的2018年第一季度报告",
          "author": "中兴通讯",
          "pubdate": "2018-07-17T12:33:11"
        }
      },
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "A5etp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "碧桂园集团副主席杨惠妍",
          "content": "杨惠妍分别于7月10日、11日买入碧桂园1000万股、1500万股",
          "author": "小财注",
          "pubdate": "2018-07-17T16:12:55"
        }
      },
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "BJetp2QBW8hrYY3zGJk7",
        "_score": 1,
        "_source": {
          "title": "河北聚焦十大行业推进国际产能合作",
          "content": "河北省政府近日出台积极参与“一带一路”建设推进国际产能合作实施方案",
          "author": "财联社",
          "pubdate": "2018-07-17T14:14:55"
        }
      }
    ]
  }
}

数值类型范围查询

新建索引添加数据

DELETE my_person

PUT my_person

PUT my_person/stu/1
{
  "name":"sean",
  "age":20
}

PUT my_person/stu/2
{
  "name":"sum",
  "age":25
}

PUT  my_person/stu/3
{
  "name":"dean",
  "age":30
}

PUT my_person/stu/4
{
  "name":"kastel",
  "age":35
}

查询“age”范围在20到30之间的人员

GET my_person/_search
{
  "query": {
    "range": {
      "age": {
        "gte": 20,
        "lte": 30
      }
    }
  }
}

查询结果

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "sum",
          "age": 25
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "sean",
          "age": 20
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "dean",
          "age": 30
        }
      }
    ]
  }
}

6.exists查询

查询文档中的字段至少包含一个非空值。

创建索引添加数据

DELETE my_person

PUT my_person

PUT my_person/stu/1
{
  "name":"sean",
  "hobby":"running"
}

PUT my_person/stu/2
{
  "name":"Jhon",
  "hobby":""
}

PUT my_person/stu/3
{
  "name":"sum",
  "hobby":["swimming",null]
}

PUT my_person/stu/4
{
  "name":"lily",
  "hobby":[null,null]
}

PUT my_person/stu/5
{
  "name":"lucy"
}

查询“hobby”不为空的文档

GET my_person/_search
{
  "query": {
    "exists":{
      "field":"hobby"
    }
  }
}

查询结果

{
  "took": 12,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "2",
        "_score": 1,
        "_source": {
          "name": "Jhon",
          "hobby": ""
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "sean",
          "hobby": "running"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "sum",
          "hobby": [
            "swimming",
            null
          ]
        }
      }
    ]
  }
}

匹配说明:

  • "hobby":"running"------值不为空(可以匹配)
  • "hobby":""------值为空字符串,不是空值(可以匹配)
  • "hobby":["swimming",null]------数组中有非空值(可以匹配)
  • "hobby":[null,null]------数组中值都为null(不可以匹配)
  • "name":"lucy"------没有hobby字段(不可以匹配)

7.prefix查询

查询以匹配字符串开头的文档,如下查询”hobby“中以”sw“开头的文档

GET my_person/_search
{
  "query": {
    "prefix": {
      "hobby": {
        "value": "sw"
      }
    }
  }
}

查询结果

{
  "took": 11,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "6",
        "_score": 1,
        "_source": {
          "name": "deak",
          "hobby": "swimming"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "sum",
          "hobby": [
            "swimming",
            null
          ]
        }
      }
    ]
  }
}

8.wildcard查询

通配符查询,如下查询hobby匹配”*ing“的文档

GET my_person/_search
{
  "query": {
    "wildcard": {
      "hobby": {
        "value": "*ing"
      }
    }
  }
}

查询结果

{
  "took": 27,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "6",
        "_score": 1,
        "_source": {
          "name": "deak",
          "hobby": "swimming"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "sean",
          "hobby": "running"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "sum",
          "hobby": [
            "swimming",
            null
          ]
        }
      }
    ]
  }
}

9.regexp查询

正则表达式查询的性能很大程度上取决于所选的正则表达式。 类似.*的匹配任何内容的正则表达式非常缓慢,并且使用了lookaround正则表达式。 如果可以的话,请尝试在正则表达式开始之前使用长前缀。 像.*?+这样的通配符匹配器大多会降低性能。大多数正则表达式引擎允许您匹配字符串的任何部分。 如果你想让正则表达式模式从字符串的开头开始,或者在字符串的末尾完成,那么你必须明确地定位它,使用^表示开始或$表示结束。

元字符 语义 说明 例子
. Match any character The period “.” can be used to represent any character
匹配任何一个字符
ab.匹配abc、ab1
+ One-or-more The plus sign “+” can be used to repeat the preceding shortest pattern once or more times.
加号“+”可以用来重复上一个最短的模式一次或多次。
“aaabbb”匹配a+b+
* Zero-or-more The asterisk “*” can be used to match the preceding shortest pattern zero-or-more times. “aaabbb”匹配a*b*
? Zero-or-one The question mark “?” makes the preceding shortest pattern optional. It matches zero or one times. “aaabbb”匹配aaa?bbbb?
{m},{m,n} Min-to-max Curly brackets “{}” can be used to specify a minimum and (optionally) a maximum number of times the preceding shortest pattern can repeat. “aaabbb”匹配a{3}b{3}和a{2,4}b{2,4}
() Grouping Parentheses “()” can be used to form sub-patterns. “ababab”匹配(ab)+
| Alternation The pipe symbol “|” acts as an OR operator. “aabb”匹配aabb|bbaa
[] Character classes Ranges of potential characters may be represented as character classes by enclosing them in square brackets “[]”. A leading ^ negates the character class. [abc]匹配 ‘a’ or ‘b’ or ‘c’
~ Complement The shortest pattern that follows a tilde “~” is negated(否定).“ab~cd”的意思是:以a开头,后跟b,后面跟一个任意长度的字符串,但不是c,以d结尾 “abcdef”匹配ab~df或a~(cb)def,不匹配ab~cdef和a~(bc)def
<> Interval间隔 The interval option enables the use of numeric ranges, enclosed by angle brackets “<>”. “foo80”匹配foo<1-100>
& Intersection The ampersand “&” joins two patterns in a way that both of them have to match. “aaabbb”匹配aaa.+&.+bbb
@ Any string The at sign “@” matches any string in its entirety. @&~(foo.+)匹配除了以“foo”开头的字符串 “foo”

查询”hobby“字段值与”sw.+“正则匹配的文档

GET my_person/_search
{
  "query": {
    "regexp":{
      "hobby":"sw.+"
    }
  }
}

查询结果

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "6",
        "_score": 1,
        "_source": {
          "name": "deak",
          "hobby": "swimming"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "sum",
          "hobby": [
            "swimming",
            null
          ]
        }
      }
    ]
  }
}

10.fuzzy查询

模糊查询

GET telegraph/_search
{
  "query": {
    "fuzzy": {
      "title": "十大"
    }
  }
}

查询结果

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.99277425,
    "hits": [
      {
        "_index": "telegraph",
        "_type": "msg",
        "_id": "BJetp2QBW8hrYY3zGJk7",
        "_score": 0.99277425,
        "_source": {
          "title": "河北聚焦十大行业推进国际产能合作",
          "content": "河北省政府近日出台积极参与“一带一路”建设推进国际产能合作实施方案",
          "author": "财联社",
          "pubdate": "2018-07-17T14:14:55"
        }
      }
    ]
  }
}

11.ids查询

根据跟定的文档id列表查询文档。

GET my_person/_search
{
  "query": {
    "ids": {
      "values": ["1","3","5"]
    }
  }
}

查询结果

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "5",
        "_score": 1,
        "_source": {
          "name": "lucy"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "1",
        "_score": 1,
        "_source": {
          "name": "sean",
          "hobby": "running"
        }
      },
      {
        "_index": "my_person",
        "_type": "stu",
        "_id": "3",
        "_score": 1,
        "_source": {
          "name": "sum",
          "hobby": [
            "swimming",
            null
          ]
        }
      }
    ]
  }
}

 

© 著作权归作者所有

共有 人打赏支持
贾峰uk
粉丝 1
博文 84
码字总数 127670
作品 0
深圳
《深入理解Elasticsearch(原书第2版)》一第2章 查询DSL进阶

第2章 查询DSL进阶 在上一章,我们了解了什么是Apache Lucene,它的整体架构,以及文本分析过程是如何完成的。之后,我们还介绍了Lucene的查询语言及其用法。除此之外,我们也讨论了Elastic...

键走偏锋
2017/11/01
0
0
《深入理解Elasticsearch(原书第2版)》——第1章  Elasticsearch简介

第1章 Elasticsearch简介 摘要: 欢迎来到Elasticsearch的世界并阅读本书第2版。通过阅读本书,我们将带领你接触与Elasticsearch紧密相关的各种话题。请注意,本书不是为初学者写的。笔者将本...

键走偏锋
2017/11/01
0
0
ElasticSearch5.x实践_day05_04_Mapping参数

三、Mapping参数 3.1 analyzer 指定分词器(分析器更合理),对索引和查询都有效。如下,指定ik分词的配置: 3.2 normalizer normalizer用于解析前的标准化配置,比如把所有的字符转化为小写等...

键走偏锋
2017/07/29
0
0
Elasticsearch: 权威指南 » 聚合 » Doc Values and Fielddata » Doc Values

聚合使用一个叫 doc values 的数据结构(在 Doc Values 介绍 里简单介绍)。 Doc values 可以使聚合更快、更高效并且内存友好,所以理解它的工作方式十分有益。 Doc values 的存在是因为倒排...

xiaomin0322
03/28
0
0
Lucene打分规则与Similarity模块详解

搜索排序结果的控制 Lucnen作为搜索引擎中,应用最为广泛和成功的开源框架,它对搜索结果的排序,有一套十分完整的机制来控制;但我们控制搜索结果排序的目的永远只有一个,那就是信息过滤,...

Breath_L
2012/03/27
0
2

没有更多内容

加载失败,请刷新页面

加载更多

windows下Git BASH安装

1.从git官网下载一个git安装包,官网下载地址https://www.git-scm.com/downloads 2.双击安装程序,进入欢迎界面点击【Next >】...》finish 3.空白处点击鼠标右键选择Git Bash Here或点击开始...

15834278076
27分钟前
2
0
strpos

一、前方有坑 php某些自带函数,如果使用不当,也会坑得你人仰马翻。比如:strpos() 先了解一下strpos()函数是干啥的。 strpos — 查找字符串首次出现的位置 用法: int strpos ( string $hay...

dragon_tech
30分钟前
1
0
Spark DAG概述

一、 DAG定义 DAG每个节点代表啥?代表的一个RDD 这里再次复习RDD的5大特性 一组分片(Partition),即数据集的基本组成单位。对于RDD来说,每个分片都会被一个计算任务处理,并决定并行计算...

张泽立
30分钟前
0
0
防抖和节流

浏览器的一些事件,如:resize,scroll,keydown,keyup,keypress,mousemove等。这些事件触发频率太过频繁,绑定在这些事件上的回调函数会不停的被调用。会加重浏览器的负担,导致用户体验...

tianyawhl
41分钟前
1
0
mysql出现Waiting for table metadata lock的解决方法

查询某一个表时,一直没有显示数据,于是就show processlist; 发现有表已经被锁了,关掉了之前的查询语句可以看到 这时候需要查看未提交的事务 select trx_state, trx_started, trx_mysql_t...

bobway
41分钟前
1
0

没有更多内容

加载失败,请刷新页面

加载更多

返回顶部
顶部