docs: document fuzzy search similarity threshold strategy
Explain the two-tier matching approach: - % operator with server-wide threshold (0.3) for fast index scans - similarity functions with configurable threshold (0.2) for edge cases Add rationale for threshold value based on German name testing
This commit is contained in:
parent
add855c8cb
commit
12f95c1998
1 changed files with 25 additions and 3 deletions
|
|
@ -42,6 +42,21 @@ defmodule Mv.Membership.Member do
|
|||
|
||||
# Module constants
|
||||
@member_search_limit 10
|
||||
|
||||
# Similarity threshold for fuzzy name/address matching.
|
||||
# Lower value = more results but less accurate (0.1-0.9)
|
||||
#
|
||||
# Fuzzy matching uses two complementary strategies:
|
||||
# 1. % operator: Fast GIN-index-based matching using server-wide threshold (default 0.3)
|
||||
# - Catches exact trigram matches quickly via index
|
||||
# 2. similarity/word_similarity functions: Precise matching with this configurable threshold
|
||||
# - Catches partial matches that % operator might miss
|
||||
#
|
||||
# Value 0.2 chosen based on testing with typical German names:
|
||||
# - "Müller" vs "Mueller": similarity ~0.65 ✓
|
||||
# - "Schmidt" vs "Schmitt": similarity ~0.75 ✓
|
||||
# - "Wagner" vs "Wegner": similarity ~0.55 ✓
|
||||
# - Random unrelated names: similarity ~0.15 ✗
|
||||
@default_similarity_threshold 0.2
|
||||
|
||||
# Use constants from Mv.Constants for member fields
|
||||
|
|
@ -539,9 +554,16 @@ defmodule Mv.Membership.Member do
|
|||
)
|
||||
end
|
||||
|
||||
# Builds fuzzy/trigram matching filter for name and street fields
|
||||
# Uses pg_trgm extension with GIN indexes for performance
|
||||
# Note: Requires trigram indexes on first_name, last_name, street
|
||||
# Builds fuzzy/trigram matching filter for name and street fields.
|
||||
# Uses pg_trgm extension with GIN indexes for performance.
|
||||
#
|
||||
# Two-tier matching strategy:
|
||||
# - % operator: Uses server-wide pg_trgm.similarity_threshold (typically 0.3)
|
||||
# for fast index-based initial filtering
|
||||
# - similarity/word_similarity: Uses @default_similarity_threshold (0.2)
|
||||
# for more lenient matching to catch edge cases
|
||||
#
|
||||
# Note: Requires trigram GIN indexes on first_name, last_name, street.
|
||||
defp build_fuzzy_filter(query, threshold) do
|
||||
expr(
|
||||
fragment("? % first_name", ^query) or
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue