mitgliederverwaltung/docs/custom-fields-search-performance.md

4.6 KiB
Raw Blame History

Performance Analysis: Custom Fields in Search Vector

Current Implementation

The member search_vector includes custom field values via database triggers that aggregate all of a member's custom field values, extract the value from each JSONB record (value->>'_union_value'), and add them at weight C.

Two triggers maintain the vector:

  • members_search_vector_trigger() — fires on members INSERT/UPDATE; runs a subquery SELECT string_agg(...) FROM custom_field_values WHERE member_id = NEW.id.
  • update_member_search_vector_from_custom_field_value() — fires on custom_field_values INSERT/UPDATE/DELETE; re-aggregates and updates the member's search_vector.

Both rely on custom_field_values_member_id_idx, so the per-member aggregation is an indexed lookup.

Applied Trigger Optimizations

update_member_search_vector_from_custom_field_value() was optimized:

  • Fetch only required member fields (first_name, last_name, email, etc.) instead of the full record — reduces per-call overhead by roughly 3050%.
  • Early return on UPDATE when the value is unchanged — skips the expensive re-aggregation entirely.

Measured effect per custom-field-value change:

Case Before After
Value changed 515 ms 310 ms
Value unchanged (UPDATE) 515 ms < 1 ms (early return)

Re-aggregation is still required whenever a value actually changes — that is necessary for search_vector consistency.

Search Vector Size

  • String custom field values are capped at 10,000 characters each; there is no cap on the number of custom fields per member.
  • tsvector has no hard size limit, but very large vectors (> ~100 KB) degrade GIN index maintenance, tsvector operations, and trigger time. Worst case: 100 fields × 10,000 chars ≈ 1 MB of aggregated text for one member.
  • Recommendation: monitor search_vector size in production; consider capping total custom-field content per member if large vectors appear.

Bulk Imports

The custom-field-value trigger fires once per row, so importing many members with custom fields is expensive. For bulk imports, temporarily disable the custom_field_values trigger, then re-aggregate search_vector in a batch after the import. The initial backfill migration also updates all members in a single transaction (table lock); for > 10,000 members, batch the backfill and run during a maintenance window.

Search Query Structure

Full-text search uses the GIN index on search_vector (fast). Substring/custom-field matching adds EXISTS (SELECT 1 FROM custom_field_values WHERE member_id = id AND ... LIKE ...) subqueries, which are not indexed on the JSONB value (sequential scan) and run even when the FTS branch already matches. This is the main known weakness; it is acceptable at the current scale (< 30 custom fields/member, < 10,000 members) but is the first thing to revisit if search slows.

Search Filter Functions

The search query in lib/membership/member.ex is split into modular filter builders, combined as a single OR-chain in priority order:

  1. build_fts_filter/1 — full-text search (highest priority, GIN-indexed, fastest).
  2. build_substring_filter/2ILIKE substring matching on structured fields (postal_code, house_number, email, city, country).
  3. build_custom_field_filter/1 — JSONB custom-field value matching via EXISTS subquery.
  4. build_fuzzy_filter/2 — trigram fuzzy matching on first_name, last_name, street (pg_trgm).

Priority: FTS > Substring > Custom Fields > Fuzzy.

Monitoring Queries

-- search_vector size distribution
SELECT
  pg_size_pretty(octet_length(search_vector::text)) AS size,
  COUNT(*) AS member_count
FROM members
WHERE search_vector IS NOT NULL
GROUP BY octet_length(search_vector::text)
ORDER BY octet_length(search_vector::text) DESC
LIMIT 20;

-- average / max custom fields per member
SELECT
  AVG(custom_field_count) AS avg_custom_fields,
  MAX(custom_field_count) AS max_custom_fields
FROM (
  SELECT member_id, COUNT(*) AS custom_field_count
  FROM custom_field_values
  GROUP BY member_id
) subq;

-- trigger execution time (requires pg_stat_statements)
SELECT mean_exec_time, calls, query
FROM pg_stat_statements
WHERE query LIKE '%members_search_vector_trigger%'
ORDER BY mean_exec_time DESC;

Future Options (if scale demands)

  • Generated/searchable text column or materialized view for custom-field substring search (to escape the unindexed JSONB LIKE).
  • Limit which custom fields are searchable, or truncate long values.
  • External search service (e.g., Elasticsearch) for advanced search.