在Elasticsearch中,给出以下文档结构:
"workhistory": {
"positions": [{
"company": "Some company",
"position": "Some Job Title",
"start": 1356998400,
"end": 34546576576,
"description": "",
"source": [
"some source",
"some other source"
]
},
{
"company": "Some other company",
"position": "Job Title",
"start": 1356998400,
"end": "",
"description": "",
"source": [
"some other source"
]
}]
}
和这种结构的映射:
workhistory: {
properties: {
positions: {
type: "nested",
include_in_parent: true,
properties: {
company: {
type: "multi_field",
fields: {
company: {type: "string"},
original: {type : "string", analyzer : "string_lowercase"}
}
},
position: {
type: "multi_field",
fields: {
position: {type: "string"},
original: {type : "string", analyzer : "string_lowercase"}
}
}
}
}
}
}
我希望能够搜索“公司”并匹配文档,如果公司=“某些公司”等.然后我想得到tf idf _score.我还想创建一个function_score查询,以根据“source”字段数组的值来提高此匹配的分数.基本上,如果源包含“some source”,则使用x amount来提升_score.如果需要,我可以更改“source”属性的结构.
这是我到目前为止所得到的:
{
"bool": {
"should": [
{
"filtered": {
"query": {
"bool": {
"should": [
{
"bool": {
"should": [
{
"match": {
"workhistory.positions.company.original": "some company"
}
}
]
}
}
],
"minimum_should_match": "100%"
}
},
"filter": {
"and": [
{
"bool": {
"should": [
{
"term": {
"workhistory.positions.company.original": "some company"
}
}
]
}
}
]
}
}
},
{
"function_score": {
"query": {
"bool": {
"should": [
{
"bool": {
"should": [
{
"match": {
"workhistory.positions.company.original": "some company"
}
}
]
}
}
],
"minimum_should_match": "100%"
}
},
"filter": {
"and": [
{
"bool": {
"should": [
{
"term": {
"workhistory.positions.company.original": "some company"
}
}
]
}
}
]
}
}
}
]
}
}
这里也有一些过滤器,因为我只想返回带有过滤值的文档.在这个例子中,过滤器和查询基本相同,但在这个查询的更大版本中,我有一些其他“可选”匹配来提升可选值等.函数_score现在没有做太多,因为我无法真正计算如何使用它.目标是能够调整应用程序代码中的提升次数并将其传递给搜索查询.
我正在使用Elasticsearch 1.3.4版.
最佳答案 我不确定为什么你在那里重复所有那些过滤器和查询,说实话.也许我错过了一些东西,但根据你的描述,我相信你所需要的只是一个“function_score”.从
documentation:
The function_score allows you to modify the score of documents that are retrieved by a query.
因此,您定义一个查询(例如 – 匹配公司名称),然后定义一个函数列表,这些函数应该为某个文档子集提升_score.从相同的文档:
Furthermore, several functions can be combined. In this case one can optionally choose to apply the function only if a document matches a given filter
因此,您有查找具有特定名称的公司的查询,然后您有一个函数的过滤器来操作与过滤器匹配的文档的_score.在这种情况下,您的过滤器是应该包含某些内容的“来源”.函数本身就是一个脚本:_score 2.
最后,这将是我的想法:
{
"query": {
"bool": {
"should": [
{
"function_score": {
"query": {
"bool": {
"should": [
{
"bool": {
"should": [
{
"match": {
"workhistory.positions.company.original": "some company"
}
}
]
}
}
],
"minimum_should_match": "100%"
}
},
"functions": [
{
"filter": {
"nested": {
"path": "workhistory.positions",
"query": {
"bool": {
"should": [
{
"match": {
"workhistory.positions.source": "some source"
}
}
]
}
}
}
},
"script_score": {
"script": "_score + 2"
}
},
{
"filter": {
"nested": {
"path": "workhistory.positions",
"query": {
"bool": {
"should": [
{
"match": {
"workhistory.positions.source": "xxx"
}
}
]
}
}
}
},
"script_score": {
"script": "_score + 4"
}
}
],
"max_boost": 5,
"score_mode": "sum",
"boost_mode": "sum"
}
}
]
}
}
}