X-Scores
The underlying search engine (Typesense) does not normalize scores for search results to give any indication of “accuracy of match”. Instead, the scores are relative to the results returned by the query.
To create a normalized score, we have implemented a simple formula based on how Typesense does its matching.
For a given input query, Typesense returns the following metrics about each result:
num_tokens_dropped indicates how many words (tokens) from the original search query were removed to find the returned documents.
typo_prefix_score indicates the degree to which a match was achieved using a prefix of the search term or by applying typo-correction logic to the beginning of a word in the document.
A higher typo_prefix_score suggests that the match was potentially the result of an inexact match or a prefix match rather than a complete, exact-text match.
Generally, an exact match should have a typo_prefix_score of 0 or a very low number, while a match on a partial or typo-ridden query might have a higher score.
query_tokens_count is the number of tokens in the search query, stripping out punctuation and stop words.
Given these metrics, we calculate a normalized score (1 to 5, 5 being exact) for each result as follows:
num_tokens_dropped = text_match_info.get("num_tokens_dropped", 0)
typo_prefix_score = text_match_info.get("typo_prefix_score", 0)
# Perfect match criteria
if num_tokens_dropped == 0 and typo_prefix_score == 0:
return 5 # Exact match
# Very good match - all fields matched but with minor issues
if num_tokens_dropped == 0 and typo_prefix_score > 0:
return 4 # Minor typos but all fields
elif num_tokens_dropped <= 1:
return 4 # One token dropped but all fields
# Good match
token_match_ratio = 1 - (num_tokens_dropped / query_tokens_count)
if token_match_ratio >= 0.8: # 80% of tokens matched
if num_tokens_dropped <= 1 and typo_prefix_score == 0:
return 4
else:
return 3
elif token_match_ratio >= 0.6: # 60% of tokens matched
return 3
elif token_match_ratio >= 0.4: # 40% of tokens matched
return 2
else:
return 1 # Poor match