# Performance Analysis: Custom Fields in Search Vector ## Current Implementation The member `search_vector` includes custom field values via database triggers that aggregate all of a member's custom field values, extract the value from each JSONB record (`value->>'_union_value'`), and add them at weight `C`. Two triggers maintain the vector: - `members_search_vector_trigger()` — fires on `members` INSERT/UPDATE; runs a subquery `SELECT string_agg(...) FROM custom_field_values WHERE member_id = NEW.id`. - `update_member_search_vector_from_custom_field_value()` — fires on `custom_field_values` INSERT/UPDATE/DELETE; re-aggregates and updates the member's `search_vector`. Both rely on `custom_field_values_member_id_idx`, so the per-member aggregation is an indexed lookup. ## Applied Trigger Optimizations `update_member_search_vector_from_custom_field_value()` was optimized: - **Fetch only required member fields** (first_name, last_name, email, etc.) instead of the full record — reduces per-call overhead by roughly 30–50%. - **Early return on UPDATE when the value is unchanged** — skips the expensive re-aggregation entirely. Measured effect per custom-field-value change: | Case | Before | After | |------|--------|-------| | Value changed | 5–15 ms | 3–10 ms | | Value unchanged (UPDATE) | 5–15 ms | < 1 ms (early return) | Re-aggregation is still required whenever a value actually changes — that is necessary for `search_vector` consistency. ## Search Vector Size - String custom field values are capped at **10,000 characters each**; there is no cap on the number of custom fields per member. - `tsvector` has no hard size limit, but very large vectors (> ~100 KB) degrade GIN index maintenance, tsvector operations, and trigger time. Worst case: 100 fields × 10,000 chars ≈ 1 MB of aggregated text for one member. - **Recommendation:** monitor `search_vector` size in production; consider capping total custom-field content per member if large vectors appear. ## Bulk Imports The custom-field-value trigger fires once per row, so importing many members with custom fields is expensive. For bulk imports, **temporarily disable the `custom_field_values` trigger**, then re-aggregate `search_vector` in a batch after the import. The initial backfill migration also updates all members in a single transaction (table lock); for > 10,000 members, batch the backfill and run during a maintenance window. ## Search Query Structure Full-text search uses the GIN index on `search_vector` (fast). Substring/custom-field matching adds `EXISTS (SELECT 1 FROM custom_field_values WHERE member_id = id AND ... LIKE ...)` subqueries, which are **not indexed** on the JSONB value (sequential scan) and run even when the FTS branch already matches. This is the main known weakness; it is acceptable at the current scale (< 30 custom fields/member, < 10,000 members) but is the first thing to revisit if search slows. ## Search Filter Functions The search query in `lib/membership/member.ex` is split into modular filter builders, combined as a single OR-chain in priority order: 1. `build_fts_filter/1` — full-text search (highest priority, GIN-indexed, fastest). 2. `build_substring_filter/2` — `ILIKE` substring matching on structured fields (postal_code, house_number, email, city, country). 3. `build_custom_field_filter/1` — JSONB custom-field value matching via `EXISTS` subquery. 4. `build_fuzzy_filter/2` — trigram fuzzy matching on first_name, last_name, street (pg_trgm). Priority: **FTS > Substring > Custom Fields > Fuzzy**. ## Monitoring Queries ```sql -- search_vector size distribution SELECT pg_size_pretty(octet_length(search_vector::text)) AS size, COUNT(*) AS member_count FROM members WHERE search_vector IS NOT NULL GROUP BY octet_length(search_vector::text) ORDER BY octet_length(search_vector::text) DESC LIMIT 20; -- average / max custom fields per member SELECT AVG(custom_field_count) AS avg_custom_fields, MAX(custom_field_count) AS max_custom_fields FROM ( SELECT member_id, COUNT(*) AS custom_field_count FROM custom_field_values GROUP BY member_id ) subq; -- trigger execution time (requires pg_stat_statements) SELECT mean_exec_time, calls, query FROM pg_stat_statements WHERE query LIKE '%members_search_vector_trigger%' ORDER BY mean_exec_time DESC; ``` ## Future Options (if scale demands) - Generated/searchable text column or materialized view for custom-field substring search (to escape the unindexed JSONB `LIKE`). - Limit which custom fields are searchable, or truncate long values. - External search service (e.g., Elasticsearch) for advanced search.