docs: document fuzzy search similarity threshold strategy
Explain the two-tier matching approach: - % operator with server-wide threshold (0.3) for fast index scans - similarity functions with configurable threshold (0.2) for edge cases Add rationale for threshold value based on German name testing
This commit is contained in:
parent
add855c8cb
commit
12f95c1998
1 changed files with 25 additions and 3 deletions
|
|
@ -42,6 +42,21 @@ defmodule Mv.Membership.Member do
|
||||||
|
|
||||||
# Module constants
|
# Module constants
|
||||||
@member_search_limit 10
|
@member_search_limit 10
|
||||||
|
|
||||||
|
# Similarity threshold for fuzzy name/address matching.
|
||||||
|
# Lower value = more results but less accurate (0.1-0.9)
|
||||||
|
#
|
||||||
|
# Fuzzy matching uses two complementary strategies:
|
||||||
|
# 1. % operator: Fast GIN-index-based matching using server-wide threshold (default 0.3)
|
||||||
|
# - Catches exact trigram matches quickly via index
|
||||||
|
# 2. similarity/word_similarity functions: Precise matching with this configurable threshold
|
||||||
|
# - Catches partial matches that % operator might miss
|
||||||
|
#
|
||||||
|
# Value 0.2 chosen based on testing with typical German names:
|
||||||
|
# - "Müller" vs "Mueller": similarity ~0.65 ✓
|
||||||
|
# - "Schmidt" vs "Schmitt": similarity ~0.75 ✓
|
||||||
|
# - "Wagner" vs "Wegner": similarity ~0.55 ✓
|
||||||
|
# - Random unrelated names: similarity ~0.15 ✗
|
||||||
@default_similarity_threshold 0.2
|
@default_similarity_threshold 0.2
|
||||||
|
|
||||||
# Use constants from Mv.Constants for member fields
|
# Use constants from Mv.Constants for member fields
|
||||||
|
|
@ -539,9 +554,16 @@ defmodule Mv.Membership.Member do
|
||||||
)
|
)
|
||||||
end
|
end
|
||||||
|
|
||||||
# Builds fuzzy/trigram matching filter for name and street fields
|
# Builds fuzzy/trigram matching filter for name and street fields.
|
||||||
# Uses pg_trgm extension with GIN indexes for performance
|
# Uses pg_trgm extension with GIN indexes for performance.
|
||||||
# Note: Requires trigram indexes on first_name, last_name, street
|
#
|
||||||
|
# Two-tier matching strategy:
|
||||||
|
# - % operator: Uses server-wide pg_trgm.similarity_threshold (typically 0.3)
|
||||||
|
# for fast index-based initial filtering
|
||||||
|
# - similarity/word_similarity: Uses @default_similarity_threshold (0.2)
|
||||||
|
# for more lenient matching to catch edge cases
|
||||||
|
#
|
||||||
|
# Note: Requires trigram GIN indexes on first_name, last_name, street.
|
||||||
defp build_fuzzy_filter(query, threshold) do
|
defp build_fuzzy_filter(query, threshold) do
|
||||||
expr(
|
expr(
|
||||||
fragment("? % first_name", ^query) or
|
fragment("? % first_name", ^query) or
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue