# ElastricSearch打分简介

2018/01/12 11:24

1、Elasticsearch的打分公式

Elasticsearch的默认打分公式是lucene的打分公式，主要分为两部分的计算，一部分是计算query部分的得分，另一部分是计算field部分的得分，下面给出ES官网给出的打分公式：

score(q,d)  =
queryNorm(q)
· coord(q,d)
· ∑ (
tf(t in d)
· idf(t)²
· t.getBoost()
· norm(t,d)
) (t in q)


## queryNorm(q)：

queryNorm(q)=1 / √sumOfSquaredWeights

sumOfSquaredWeights =idf(t1)*idf(t1)+idf(t2)*idf(t2)+...+idf(tn)*idf(tn)

## coord(q,d):

coord(q,d)是一个协调因子它的值如下：


coord(q,d)=overlap/maxoverlap  

tf(t in d):

tf(t in d) = √frequency


## idf(t):

idf(t) = 1 + log ( numDocs / (docFreq + 1))


## norm(t,d):

f.getboost表明该field的权值越大，越重要

lengthnorm表示该field越长，越不重要，越短，越重要，在官方文档给出的公式中，默认boost全部为1，在此给出官方文档的打分公式：

norm(d) = 1 / √numTerms


GET act_shop-2018.01.12/shop/_search
{
"size": 1,
"query": {
"term": {
"name.keyword": "星巴克"
}
}
, "explain": true
}


{
"took": 25,
"timed_out": false,
"_shards": {
"total": 150,
"successful": 150,
"failed": 0
},
"hits": {
"total": 127667,
"max_score": 15.511484,
"hits": [
{
"_shard": "[act_shop-2018.01.12][80]",
"_node": "6vfIeV95QOK1vAcLdx6CEA",
"_index": "act_shop-2018.01.12",
"_type": "shop",
"_id": "187672",
"_score": 15.511484,
"_routing": "36341",
"_parent": "36341",
"_source": {
"status": 1,
"city": {
"id": 2084,
"name": "虹口区"
},
"update_time": "2017-10-23 15:23:00.329000",
"tel": [
"021-65200108"
],
"name": "星巴克(凉城店)",
"tags": [
"餐饮服务",
"咖啡厅",
"咖啡厅"
],
"tags_enrich": {
"name": "美食",
"id": 10
},
"id": 187672,
"label": "have_act",
"create_time": "2017-01-11 14:59:43.950000",
"city_enrich": {
"region": "华东地区",
"name": "上海",
"level": 1
},
"coordinate": {
"lat": 31.29496,
"lon": 121.475442
},
"brand": {
"id": 490,
"name": "星巴克"
}
},
"_explanation": {
"value": 15.511484,
"description": "sum of:",
"details": [
{
"value": 15.511484,
"description": "sum of:",
"details": [
{
"value": 4.7601295,
"description": "weight(name:星 in 6914) [PerFieldSimilarity], result of:",
"details": [
{
"value": 4.7601295,
"description": "score(doc=6914,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 4.314013,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 159,
"description": "docFreq",
"details": []
},
{
"value": 11920,
"description": "docCount",
"details": []
}
]
},
{
"value": 1.103411,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 9.224329,
"description": "avgFieldLength",
"details": []
},
{
"value": 7.111111,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
},
{
"value": 5.0423846,
"description": "weight(name:巴 in 6914) [PerFieldSimilarity], result of:",
"details": [
{
"value": 5.0423846,
"description": "score(doc=6914,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 4.5698156,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 123,
"description": "docFreq",
"details": []
},
{
"value": 11920,
"description": "docCount",
"details": []
}
]
},
{
"value": 1.103411,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 9.224329,
"description": "avgFieldLength",
"details": []
},
{
"value": 7.111111,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
},
{
"value": 5.70897,
"description": "weight(name:克 in 6914) [PerFieldSimilarity], result of:",
"details": [
{
"value": 5.70897,
"description": "score(doc=6914,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 5.173929,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 67,
"description": "docFreq",
"details": []
},
{
"value": 11920,
"description": "docCount",
"details": []
}
]
},
{
"value": 1.103411,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 9.224329,
"description": "avgFieldLength",
"details": []
},
{
"value": 7.111111,
"description": "fieldLength",
"details": []
}
]
}
]
}
]
}
]
},
{
"value": 0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0,
"description": "# clause",
"details": []
},
{
"value": 1,
"description": "_type:shop, product of:",
"details": [
{
"value": 1,
"description": "boost",
"details": []
},
{
"value": 1,
"description": "queryNorm",
"details": []
}
]
}
]
}
]
}
}
]
}
}

1、在 "_shard": "[act_shop-2018.01.12][80]"这个分片里，按照es的标准分词，当match'星巴克'的时候，然后会分词为'星'，'巴'，'克'这三个词。每个词的得分为:

'星'：4.7601295

'巴'：5.0423846

'克'：5.70897

2、然后每个词是怎么得分的，这里详细说一下，以'星'为例：

sorce'星'=idf.tfNorm（也就是词频*逆向词频）

idf计算如下：

{
"value": 4.7601295,
"description": "score(doc=6914,freq=1.0 = termFreq=1.0\n), product of:",
"details": [
{
"value": 4.314013,
"description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
"details": [
{
"value": 159,
"description": "docFreq",
"details": []
},
{
"value": 11920,
"description": "docCount",
"details": []
}
]
}

docFreq:在这个分片里，击中'星'的文档数量：159

docCount：在这个分片里，包括总的文档数量:11920

tfNorm计算如下

tf可以理解为，这个'星'，在某个文档里出现的次数的一些占比

{
"value": 1.103411,
"description": "tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:",
"details": [
{
"value": 1,
"description": "termFreq=1.0",
"details": []
},
{
"value": 1.2,
"description": "parameter k1",
"details": []
},
{
"value": 0.75,
"description": "parameter b",
"details": []
},
{
"value": 9.224329,
"description": "avgFieldLength",
"details": []
},
{
"value": 7.111111,
"description": "fieldLength",
"details": []
}
]
}

tfNorm=(freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength))=1.103411

0
0 收藏

0 评论
0 收藏
0