本文共 3021 字,大约阅读时间需要 10 分钟。
How long is the field? The shorter the field, the higher the weight. If a term appears in a short field, such as a title
field, it is more likely that the content of that field is about the term than if the same term appears in a much bigger body
field. The field length norm is calculated as follows:
norm(d) = 1 / √numTerms
The field-length norm ( |
While the field-length norm is important for full-text search, many other fields don’t need norms. Norms consume approximately 1 byte per string
field per document in the index, whether or not a document contains the field. Exact-value not_analyzed
string fields have norms disabled by default, but you can use the field mapping to disable norms on analyzed
fields as well:
PUT /my_index{ "mappings": { "doc": { "properties": { "text": { "type": "string", "norms": { "enabled": false } } } } } }
This field will not take the field-length norm into account. A long field and a short field will be scored as if they were the same length. |
For use cases such as logging, norms are not useful. All you care about is whether a field contains a particular error code or a particular browser identifier. The length of the field does not affect the outcome. Disabling norms can save a significant amount of memory.
These three factors—term frequency, inverse document frequency, and field-length norm—are calculated and stored at index time. Together, they are used to calculate the weight of a single term in a particular document.
When we refer to documents in the preceding formulae, we are actually talking about a field within a document. Each field has its own inverted index and thus, for TF/IDF purposes, the value of the field is the value of the document.
When we run a simple term
query with explain
set to true
(see ), you will see that the only factors involved in calculating the score are the ones explained in the preceding sections:
PUT /my_index/doc/1 { "text" : "quick brown fox" } GET /my_index/doc/_search?explain { "query": { "term": { "text": "fox" } } }
The (abbreviated) explanation
from the preceding request is as follows:
weight(text:fox in 0) [PerFieldSimilarity]: 0.15342641
result of: fieldWeight in 0 0.15342641 product of: tf(freq=1.0), with freq of 1: 1.0
idf(docFreq=1, maxDocs=1): 0.30685282
fieldNorm(doc=0): 0.5
The final | |
The term | |
The inverse document frequency of | |
The field-length normalization factor for this field. |
Of course, queries usually consist of more than one term, so we need a way of combining the weights of multiple terms. For this, we turn to the vector space model.