4.6 KiB
Performance Analysis: Custom Fields in Search Vector
Current Implementation
The member search_vector includes custom field values via database triggers that aggregate all of a member's custom field values, extract the value from each JSONB record (value->>'_union_value'), and add them at weight C.
Two triggers maintain the vector:
members_search_vector_trigger()— fires onmembersINSERT/UPDATE; runs a subquerySELECT string_agg(...) FROM custom_field_values WHERE member_id = NEW.id.update_member_search_vector_from_custom_field_value()— fires oncustom_field_valuesINSERT/UPDATE/DELETE; re-aggregates and updates the member'ssearch_vector.
Both rely on custom_field_values_member_id_idx, so the per-member aggregation is an indexed lookup.
Applied Trigger Optimizations
update_member_search_vector_from_custom_field_value() was optimized:
- Fetch only required member fields (first_name, last_name, email, etc.) instead of the full record — reduces per-call overhead by roughly 30–50%.
- Early return on UPDATE when the value is unchanged — skips the expensive re-aggregation entirely.
Measured effect per custom-field-value change:
| Case | Before | After |
|---|---|---|
| Value changed | 5–15 ms | 3–10 ms |
| Value unchanged (UPDATE) | 5–15 ms | < 1 ms (early return) |
Re-aggregation is still required whenever a value actually changes — that is necessary for search_vector consistency.
Search Vector Size
- String custom field values are capped at 10,000 characters each; there is no cap on the number of custom fields per member.
tsvectorhas no hard size limit, but very large vectors (> ~100 KB) degrade GIN index maintenance, tsvector operations, and trigger time. Worst case: 100 fields × 10,000 chars ≈ 1 MB of aggregated text for one member.- Recommendation: monitor
search_vectorsize in production; consider capping total custom-field content per member if large vectors appear.
Bulk Imports
The custom-field-value trigger fires once per row, so importing many members with custom fields is expensive. For bulk imports, temporarily disable the custom_field_values trigger, then re-aggregate search_vector in a batch after the import. The initial backfill migration also updates all members in a single transaction (table lock); for > 10,000 members, batch the backfill and run during a maintenance window.
Search Query Structure
Full-text search uses the GIN index on search_vector (fast). Substring/custom-field matching adds EXISTS (SELECT 1 FROM custom_field_values WHERE member_id = id AND ... LIKE ...) subqueries, which are not indexed on the JSONB value (sequential scan) and run even when the FTS branch already matches. This is the main known weakness; it is acceptable at the current scale (< 30 custom fields/member, < 10,000 members) but is the first thing to revisit if search slows.
Search Filter Functions
The search query in lib/membership/member.ex is split into modular filter builders, combined as a single OR-chain in priority order:
build_fts_filter/1— full-text search (highest priority, GIN-indexed, fastest).build_substring_filter/2—ILIKEsubstring matching on structured fields (postal_code, house_number, email, city, country).build_custom_field_filter/1— JSONB custom-field value matching viaEXISTSsubquery.build_fuzzy_filter/2— trigram fuzzy matching on first_name, last_name, street (pg_trgm).
Priority: FTS > Substring > Custom Fields > Fuzzy.
Monitoring Queries
-- search_vector size distribution
SELECT
pg_size_pretty(octet_length(search_vector::text)) AS size,
COUNT(*) AS member_count
FROM members
WHERE search_vector IS NOT NULL
GROUP BY octet_length(search_vector::text)
ORDER BY octet_length(search_vector::text) DESC
LIMIT 20;
-- average / max custom fields per member
SELECT
AVG(custom_field_count) AS avg_custom_fields,
MAX(custom_field_count) AS max_custom_fields
FROM (
SELECT member_id, COUNT(*) AS custom_field_count
FROM custom_field_values
GROUP BY member_id
) subq;
-- trigger execution time (requires pg_stat_statements)
SELECT mean_exec_time, calls, query
FROM pg_stat_statements
WHERE query LIKE '%members_search_vector_trigger%'
ORDER BY mean_exec_time DESC;
Future Options (if scale demands)
- Generated/searchable text column or materialized view for custom-field substring search (to escape the unindexed JSONB
LIKE). - Limit which custom fields are searchable, or truncate long values.
- External search service (e.g., Elasticsearch) for advanced search.