我在我的应用程序中使用solr搜索与faceting.我的用例是这样一种方式,即datadir中的索引文件不断变化.
问题是,当我基于特定领域时,我会面对.我从以前在数据目录中(当前不存在)的索引中获取值.但是它们返回的值为0.我不明白以前索引的值在哪里持久存在并在完全新的搜索期间返回?
虽然我可以简单地跳过计数为0的方面,但我知道这可以严重影响我的可扩展性.任何指针都不包括以前搜索者的方面?
[编辑1]:我正在使用的当前解决方法是在我的URL中添加facet.mincount = 1.但是,我想这可能会超过我的表现.
最佳答案 我找不到评论选项&我没有足够的声誉来投票!
我有同样的问题.
我们正在使用solr 4.2进行原子更新.
我在这里找到了一些解释:http://collab.sakaiproject.org/pipermail/oae-dev/2011-November/000693.html
摘抄:
To efficiently handle facets for multi-valued fields (like tags), Solr
builds an “uninverted index” (which you think would just be called an
“index”, but I suppose that’s even more confusing), which maps
internal document IDs to the list of terms they contain. Calculating
facets from this data structure just requires walking over every
document in the result set, looking up the terms it contains in the
uninverted index, and adding them to the tally for all documents.However, there’s a sneaky optimisation here that causes the zero
counts we’re seeing. For terms that appear in more than 5% of
documents, Solr doesn’t include them in the uninverted index (leaving
them out helps to keep the size in memory down, I guess), and instead
gets the count for these terms using a regular query against the
Lucene index. Since the set of “common” terms isn’t specific to your
result set, and since any given result set won’t necessarily contain
all of these terms, you can get back counts of zero.
它可能不是来自旧的索引值,而只是存在于超过5%的文档中的术语?