mitgliederverwaltung/docs/custom-fields-search-performance.md

89 lines
4.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Performance Analysis: Custom Fields in Search Vector
## Current Implementation
The member `search_vector` includes custom field values via database triggers that aggregate all of a member's custom field values, extract the value from each JSONB record (`value->>'_union_value'`), and add them at weight `C`.
Two triggers maintain the vector:
- `members_search_vector_trigger()` — fires on `members` INSERT/UPDATE; runs a subquery `SELECT string_agg(...) FROM custom_field_values WHERE member_id = NEW.id`.
- `update_member_search_vector_from_custom_field_value()` — fires on `custom_field_values` INSERT/UPDATE/DELETE; re-aggregates and updates the member's `search_vector`.
Both rely on `custom_field_values_member_id_idx`, so the per-member aggregation is an indexed lookup.
## Applied Trigger Optimizations
`update_member_search_vector_from_custom_field_value()` was optimized:
- **Fetch only required member fields** (first_name, last_name, email, etc.) instead of the full record — reduces per-call overhead by roughly 3050%.
- **Early return on UPDATE when the value is unchanged** — skips the expensive re-aggregation entirely.
Measured effect per custom-field-value change:
| Case | Before | After |
|------|--------|-------|
| Value changed | 515 ms | 310 ms |
| Value unchanged (UPDATE) | 515 ms | < 1 ms (early return) |
Re-aggregation is still required whenever a value actually changes that is necessary for `search_vector` consistency.
## Search Vector Size
- String custom field values are capped at **10,000 characters each**; there is no cap on the number of custom fields per member.
- `tsvector` has no hard size limit, but very large vectors (> ~100 KB) degrade GIN index maintenance, tsvector operations, and trigger time. Worst case: 100 fields × 10,000 chars ≈ 1 MB of aggregated text for one member.
- **Recommendation:** monitor `search_vector` size in production; consider capping total custom-field content per member if large vectors appear.
## Bulk Imports
The custom-field-value trigger fires once per row, so importing many members with custom fields is expensive. For bulk imports, **temporarily disable the `custom_field_values` trigger**, then re-aggregate `search_vector` in a batch after the import. The initial backfill migration also updates all members in a single transaction (table lock); for > 10,000 members, batch the backfill and run during a maintenance window.
## Search Query Structure
Full-text search uses the GIN index on `search_vector` (fast). Substring/custom-field matching adds `EXISTS (SELECT 1 FROM custom_field_values WHERE member_id = id AND ... LIKE ...)` subqueries, which are **not indexed** on the JSONB value (sequential scan) and run even when the FTS branch already matches. This is the main known weakness; it is acceptable at the current scale (< 30 custom fields/member, < 10,000 members) but is the first thing to revisit if search slows.
## Search Filter Functions
The search query in `lib/membership/member.ex` is split into modular filter builders, combined as a single OR-chain in priority order:
1. `build_fts_filter/1` full-text search (highest priority, GIN-indexed, fastest).
2. `build_substring_filter/2` `ILIKE` substring matching on structured fields (postal_code, house_number, email, city, country).
3. `build_custom_field_filter/1` JSONB custom-field value matching via `EXISTS` subquery.
4. `build_fuzzy_filter/2` trigram fuzzy matching on first_name, last_name, street (pg_trgm).
Priority: **FTS > Substring > Custom Fields > Fuzzy**.
## Monitoring Queries
```sql
-- search_vector size distribution
SELECT
pg_size_pretty(octet_length(search_vector::text)) AS size,
COUNT(*) AS member_count
FROM members
WHERE search_vector IS NOT NULL
GROUP BY octet_length(search_vector::text)
ORDER BY octet_length(search_vector::text) DESC
LIMIT 20;
-- average / max custom fields per member
SELECT
AVG(custom_field_count) AS avg_custom_fields,
MAX(custom_field_count) AS max_custom_fields
FROM (
SELECT member_id, COUNT(*) AS custom_field_count
FROM custom_field_values
GROUP BY member_id
) subq;
-- trigger execution time (requires pg_stat_statements)
SELECT mean_exec_time, calls, query
FROM pg_stat_statements
WHERE query LIKE '%members_search_vector_trigger%'
ORDER BY mean_exec_time DESC;
```
## Future Options (if scale demands)
- Generated/searchable text column or materialized view for custom-field substring search (to escape the unindexed JSONB `LIKE`).
- Limit which custom fields are searchable, or truncate long values.
- External search service (e.g., Elasticsearch) for advanced search.