89 lines
4.6 KiB
Markdown
89 lines
4.6 KiB
Markdown
# Performance Analysis: Custom Fields in Search Vector
|
||
|
||
## Current Implementation
|
||
|
||
The member `search_vector` includes custom field values via database triggers that aggregate all of a member's custom field values, extract the value from each JSONB record (`value->>'_union_value'`), and add them at weight `C`.
|
||
|
||
Two triggers maintain the vector:
|
||
|
||
- `members_search_vector_trigger()` — fires on `members` INSERT/UPDATE; runs a subquery `SELECT string_agg(...) FROM custom_field_values WHERE member_id = NEW.id`.
|
||
- `update_member_search_vector_from_custom_field_value()` — fires on `custom_field_values` INSERT/UPDATE/DELETE; re-aggregates and updates the member's `search_vector`.
|
||
|
||
Both rely on `custom_field_values_member_id_idx`, so the per-member aggregation is an indexed lookup.
|
||
|
||
## Applied Trigger Optimizations
|
||
|
||
`update_member_search_vector_from_custom_field_value()` was optimized:
|
||
|
||
- **Fetch only required member fields** (first_name, last_name, email, etc.) instead of the full record — reduces per-call overhead by roughly 30–50%.
|
||
- **Early return on UPDATE when the value is unchanged** — skips the expensive re-aggregation entirely.
|
||
|
||
Measured effect per custom-field-value change:
|
||
|
||
| Case | Before | After |
|
||
|------|--------|-------|
|
||
| Value changed | 5–15 ms | 3–10 ms |
|
||
| Value unchanged (UPDATE) | 5–15 ms | < 1 ms (early return) |
|
||
|
||
Re-aggregation is still required whenever a value actually changes — that is necessary for `search_vector` consistency.
|
||
|
||
## Search Vector Size
|
||
|
||
- String custom field values are capped at **10,000 characters each**; there is no cap on the number of custom fields per member.
|
||
- `tsvector` has no hard size limit, but very large vectors (> ~100 KB) degrade GIN index maintenance, tsvector operations, and trigger time. Worst case: 100 fields × 10,000 chars ≈ 1 MB of aggregated text for one member.
|
||
- **Recommendation:** monitor `search_vector` size in production; consider capping total custom-field content per member if large vectors appear.
|
||
|
||
## Bulk Imports
|
||
|
||
The custom-field-value trigger fires once per row, so importing many members with custom fields is expensive. For bulk imports, **temporarily disable the `custom_field_values` trigger**, then re-aggregate `search_vector` in a batch after the import. The initial backfill migration also updates all members in a single transaction (table lock); for > 10,000 members, batch the backfill and run during a maintenance window.
|
||
|
||
## Search Query Structure
|
||
|
||
Full-text search uses the GIN index on `search_vector` (fast). Substring/custom-field matching adds `EXISTS (SELECT 1 FROM custom_field_values WHERE member_id = id AND ... LIKE ...)` subqueries, which are **not indexed** on the JSONB value (sequential scan) and run even when the FTS branch already matches. This is the main known weakness; it is acceptable at the current scale (< 30 custom fields/member, < 10,000 members) but is the first thing to revisit if search slows.
|
||
|
||
## Search Filter Functions
|
||
|
||
The search query in `lib/membership/member.ex` is split into modular filter builders, combined as a single OR-chain in priority order:
|
||
|
||
1. `build_fts_filter/1` — full-text search (highest priority, GIN-indexed, fastest).
|
||
2. `build_substring_filter/2` — `ILIKE` substring matching on structured fields (postal_code, house_number, email, city, country).
|
||
3. `build_custom_field_filter/1` — JSONB custom-field value matching via `EXISTS` subquery.
|
||
4. `build_fuzzy_filter/2` — trigram fuzzy matching on first_name, last_name, street (pg_trgm).
|
||
|
||
Priority: **FTS > Substring > Custom Fields > Fuzzy**.
|
||
|
||
## Monitoring Queries
|
||
|
||
```sql
|
||
-- search_vector size distribution
|
||
SELECT
|
||
pg_size_pretty(octet_length(search_vector::text)) AS size,
|
||
COUNT(*) AS member_count
|
||
FROM members
|
||
WHERE search_vector IS NOT NULL
|
||
GROUP BY octet_length(search_vector::text)
|
||
ORDER BY octet_length(search_vector::text) DESC
|
||
LIMIT 20;
|
||
|
||
-- average / max custom fields per member
|
||
SELECT
|
||
AVG(custom_field_count) AS avg_custom_fields,
|
||
MAX(custom_field_count) AS max_custom_fields
|
||
FROM (
|
||
SELECT member_id, COUNT(*) AS custom_field_count
|
||
FROM custom_field_values
|
||
GROUP BY member_id
|
||
) subq;
|
||
|
||
-- trigger execution time (requires pg_stat_statements)
|
||
SELECT mean_exec_time, calls, query
|
||
FROM pg_stat_statements
|
||
WHERE query LIKE '%members_search_vector_trigger%'
|
||
ORDER BY mean_exec_time DESC;
|
||
```
|
||
|
||
## Future Options (if scale demands)
|
||
|
||
- Generated/searchable text column or materialized view for custom-field substring search (to escape the unindexed JSONB `LIKE`).
|
||
- Limit which custom fields are searchable, or truncate long values.
|
||
- External search service (e.g., Elasticsearch) for advanced search.
|