docs(db): refresh, condense and align database and groups docs

This commit is contained in:
Moritz 2026-06-15 21:53:36 +02:00
parent 5d8f173529
commit 0b36a43edc
4 changed files with 360 additions and 1875 deletions

View file

@ -4,105 +4,54 @@
This document provides a comprehensive overview of the Mila Membership Management System database schema.
## Quick Links
- **DBML file:** [`database_schema.dbml`](./database_schema.dbml) — full per-column intent notes and relationship edges.
- **Search-vector performance:** see [`custom-fields-search-performance.md`](./custom-fields-search-performance.md) for trigger cost analysis and tuning.
- **DBML File:** [`database_schema.dbml`](./database_schema.dbml)
- **Visualize Online:**
- [dbdiagram.io](https://dbdiagram.io) - Upload the DBML file
- [dbdocs.io](https://dbdocs.io) - Generate interactive documentation
The DBML is **hand-maintained** (not auto-generated); keep it in sync with `priv/repo/migrations/`.
## Schema Statistics
| Metric | Count |
|--------|-------|
| **Tables** | 11 |
| **Tables** | 12 |
| **Domains** | 4 (Accounts, Membership, MembershipFees, Authorization) |
| **Relationships** | 9 |
| **Indexes** | 25+ |
| **Triggers** | 1 (Full-text search) |
| **Triggers** | 3 (member, custom_field_values, member_groups → member search-vector) |
## Tables Overview
### Accounts Domain
- **`users`** — authentication accounts. Dual auth (Password + OIDC), optional 1:1 link to a member; email is the source of truth when linked.
- **`tokens`** — JWT storage for AshAuthentication; multiple purposes, revocation by deletion.
#### `users`
- **Purpose:** User authentication and session management
- **Rows (Estimated):** Low to Medium (typically 10-50% of members)
- **Key Features:**
- Dual authentication (Password + OIDC)
- Optional 1:1 link to members
- Email as source of truth when linked
#### `tokens`
- **Purpose:** JWT token storage for AshAuthentication
- **Rows (Estimated):** Medium to High (multiple tokens per user)
- **Key Features:**
- Token lifecycle management
- Revocation support
- Multiple token purposes
OIDC account linking is recorded on the `users` table via the `oidc_id` column; there is no separate `user_identities` table.
### Membership Domain
#### `members`
- **Purpose:** Club member master data
- **Rows (Estimated):** High (core entity)
- **Key Features:**
- Complete member profile
- Full-text search via tsvector
- Bidirectional email sync with users
- Flexible address and contact data
#### `custom_field_values`
- **Purpose:** Dynamic custom member attributes
- **Rows (Estimated):** Variable (N per member)
- **Key Features:**
- Union type value storage (JSONB)
- Multiple data types supported
- One custom field value per custom field per member
#### `custom_fields`
- **Purpose:** Schema definitions for custom_field_values
- **Rows (Estimated):** Low (admin-defined)
- **Key Features:**
- Type definitions
- Immutable and required flags
- Centralized custom field management
#### `settings`
- **Purpose:** Global application settings (singleton resource)
- **Rows (Estimated):** 1 (singleton pattern)
- **Key Features:**
- Club name configuration
- Member field visibility settings
- Membership fee default settings
- Environment variable support for club name
#### `groups`
- **Purpose:** Group definitions for organizing members
- **Rows (Estimated):** Low (typically 5-20 groups per club)
- **Key Features:**
- Unique group names (case-insensitive)
- URL-friendly slugs (auto-generated, immutable)
- Optional descriptions
- Many-to-many relationship with members
#### `member_groups`
- **Purpose:** Join table for many-to-many relationship between members and groups
- **Rows (Estimated):** Medium to High (multiple groups per member)
- **Key Features:**
- Unique constraint on (member_id, group_id)
- CASCADE delete on both sides
- Efficient indexes for queries
- **`members`** — club member master data. Full-text + fuzzy search, bidirectional email sync with users, flexible address/contact data, `country`, optional `vereinfacht_contact_id` (external vereinfacht.de contact).
- **`custom_field_values`** — dynamic per-member attributes. Union-type value in JSONB; one value per custom field per member.
- **`custom_fields`** — schema definitions for custom field values (type, `required`/`show_in_overview` flags, optional `join_description`, auto-generated slug).
- **`settings`** — global application settings (singleton). Club name (also via `ASSOCIATION_NAME` env), member-field visibility/required maps, fee defaults, plus OIDC, SMTP/mail-from, vereinfacht.de, public join-form, `registration_enabled`, and `oidc_only` configuration. See [Settings configuration columns](#settings-configuration-columns).
- **`groups`** — member groupings. Case-insensitive-unique names, auto-generated immutable slugs, optional descriptions; many-to-many with members.
- **`member_groups`** — join table for members ↔ groups. Unique `(member_id, group_id)`, CASCADE delete on both sides (join table only).
- **`join_requests`** — public join flow (onboarding, double opt-in). Status machine `pending_confirmation → submitted → approved/rejected`; confirmation token stored as hash only, ~24h retention for unconfirmed records.
### Authorization Domain
- **`roles`** — RBAC. Links users to one of four hardcoded permission sets (`own_data`, `read_only`, `normal_user`, `admin`); system roles are deletion-protected.
#### `roles`
- **Purpose:** Role-based access control (RBAC)
- **Rows (Estimated):** Low (typically 3-10 roles)
- **Key Features:**
- Links users to permission sets
- System role protection
- Four hardcoded permission sets: own_data, read_only, normal_user, admin
### MembershipFees Domain
- **`membership_fee_types`** — fee types with immutable billing interval.
- **`membership_fee_cycles`** — per-member billing cycles with payment status.
## Settings configuration columns
The singleton `settings` row carries runtime configuration (all nullable unless noted). Grouped by area:
- **Member overview:** `member_field_visibility` (JSONB; absent key = visible), `member_field_required` (JSONB).
- **Membership fees:** `include_joining_cycle` (bool, NOT NULL, default true), `default_membership_fee_type_id` (FK → membership_fee_types, ON DELETE SET NULL).
- **Registration / login:** `registration_enabled` (bool, NOT NULL, default true), `oidc_only` (bool, NOT NULL, default false).
- **OIDC:** `oidc_client_id`, `oidc_client_secret`, `oidc_base_url`, `oidc_redirect_uri`, `oidc_admin_group_name`, `oidc_groups_claim`.
- **SMTP / mail-from:** `smtp_host`, `smtp_port` (bigint), `smtp_username`, `smtp_password`, `smtp_ssl`, `smtp_from_name`, `smtp_from_email`.
- **vereinfacht.de:** `vereinfacht_api_url`, `vereinfacht_api_key`, `vereinfacht_club_id`, `vereinfacht_app_url`.
- **Public join form:** `join_form_enabled` (bool, NOT NULL, default false), `join_form_field_ids` (text[]), `join_form_field_required` (JSONB).
## Key Relationships
@ -124,123 +73,54 @@ Member (N) ←→ (N) Group
Settings (1) → MembershipFeeType (0..1)
```
### Relationship Details
## Foreign Key On-Delete Behavior
1. **User ↔ Member (Optional 1:1, both sides optional)**
- A User can have 0 or 1 Member (`user.member_id` can be NULL)
- A Member can have 0 or 1 User (optional `has_one` relationship)
- Both entities can exist independently
- Email synchronization when linked (User.email is source of truth)
- `ON DELETE SET NULL` on user side (User preserved when Member deleted)
| Relationship | On Delete | Rationale |
|--------------|-----------|-----------|
| `users.member_id → members.id` | SET NULL | Preserve user account when member deleted |
| `users.role_id → roles.id` | RESTRICT | Cannot delete a role that still has users |
| `custom_field_values.member_id → members.id` | CASCADE | Delete values with member |
| `custom_field_values.custom_field_id → custom_fields.id` | CASCADE | Delete values when the custom field is deleted |
| `members.membership_fee_type_id → membership_fee_types.id` | RESTRICT | Cannot delete a fee type assigned to members |
| `membership_fee_cycles.member_id → members.id` | CASCADE | Cycles deleted with member |
| `membership_fee_cycles.membership_fee_type_id → membership_fee_types.id` | RESTRICT | Cannot delete a fee type with cycles |
| `settings.default_membership_fee_type_id → membership_fee_types.id` | SET NULL | Clear default if fee type deleted |
| `member_groups.member_id → members.id` | CASCADE | Association removed; member preserved |
| `member_groups.group_id → groups.id` | CASCADE | Association removed; group preserved |
2. **User → Role (N:1)**
- Many users can be assigned to one role
- `ON DELETE RESTRICT` - cannot delete role if users are assigned
- Role links user to permission set for authorization
`join_requests.reviewed_by_user_id` is intentionally **unconstrained** (no FK); `reviewed_by_display` is denormalized so the UI need not load the reviewer User.
3. **Member → CustomFieldValues (1:N)**
- One member, many custom_field_values
- `ON DELETE CASCADE` - custom_field_values deleted with member
- Composite unique constraint (member_id, custom_field_id)
4. **CustomFieldValue → CustomField (N:1)**
- Custom field values reference type definition
- `ON DELETE RESTRICT` - cannot delete type if in use
- Type defines data structure
5. **Member → MembershipFeeType (N:1, optional)**
- Many members can be assigned to one fee type
- `ON DELETE RESTRICT` - cannot delete fee type if members are assigned
- Optional relationship (member can have no fee type)
6. **Member → MembershipFeeCycles (1:N)**
- One member, many billing cycles
- `ON DELETE CASCADE` - cycles deleted when member deleted
- Unique constraint (member_id, cycle_start)
7. **MembershipFeeCycle → MembershipFeeType (N:1)**
- Many cycles reference one fee type
- `ON DELETE RESTRICT` - cannot delete fee type if cycles exist
8. **Settings → MembershipFeeType (N:1, optional)**
- Settings can reference a default fee type
- `ON DELETE SET NULL` - if fee type is deleted, setting is cleared
9. **Member ↔ Group (N:N via MemberGroup)**
- Many-to-many relationship through `member_groups` join table
- `ON DELETE CASCADE` on both sides - removing member/group removes associations
- Unique constraint on (member_id, group_id) prevents duplicates
- Groups searchable via member search vector
**User ↔ Member** is an optional 1:1 (both sides may be NULL; entities exist independently). **Member ↔ Group** is many-to-many through `member_groups` (CASCADE lives only on the join table).
## Important Business Rules
### Email Synchronization
- **User.email** is the source of truth when linked
- On linking: Member.email ← User.email (overwrite)
- After linking: Changes sync bidirectionally
- Validation prevents email conflicts
- **User.email is the source of truth when linked.** On linking, `Member.email ← User.email` (overwrite). Afterwards changes sync bidirectionally. Validation prevents email conflicts with other unlinked users.
### Authentication Strategies
- **Password:** Email + hashed_password
- **OIDC:** Email + oidc_id (Rauthy provider)
- At least one method required per user
- **Password:** email + hashed_password. **OIDC:** email + oidc_id (Rauthy provider), the external identity recorded via the `oidc_id` column on `users`. At least one method required per user.
### Member Constraints
- First name and last name required (min 1 char)
- Email unique, validated format (5-254 chars)
- Exit date must be after join date
- Phone: `+?[0-9\- ]{6,20}`
- Postal code: optional (no format validation)
- Country: optional
- `first_name` / `last_name`: optional, but if present min 1 char.
- `email`: unique, validated format (5254 chars).
- `exit_date` must be after `join_date`.
- `postal_code`, `country`: optional, no format validation.
### CustomFieldValue System
- Maximum one custom field value per custom field per member
- Value stored as union type in JSONB
- Supported types: string, integer, boolean, date, email
- Types can be marked as immutable or required
## Indexes
### Performance Indexes
**members:**
- `search_vector` (GIN) - Full-text search (tsvector)
- `first_name` (GIN trgm) - Fuzzy search on first name
- `last_name` (GIN trgm) - Fuzzy search on last name
- `email` (GIN trgm) - Fuzzy search on email
- `city` (GIN trgm) - Fuzzy search on city
- `street` (GIN trgm) - Fuzzy search on street
- `notes` (GIN trgm) - Fuzzy search on notes
- `email` (B-tree) - Exact email lookups
- `last_name` (B-tree) - Name sorting
- `join_date` (B-tree) - Date filtering
**custom_field_values:**
- `member_id` - Member custom field value lookups
- `custom_field_id` - Type-based queries
- Composite `(member_id, custom_field_id)` - Uniqueness
**tokens:**
- `subject` - User token lookups
- `expires_at` - Token cleanup
- `purpose` - Purpose-based queries
**users:**
- `email` (unique) - Login lookups
- `oidc_id` (unique) - OIDC authentication
- `member_id` (unique) - Member linkage
- One value per custom field per member. Value stored as a union type in JSONB: `{type: "string|integer|boolean|date|email", value: <actual_value>}`. Custom fields can be marked `required` and toggled `show_in_overview`.
## Full-Text Search
### Implementation
- **Trigger** on `members` (INSERT/UPDATE): runs function `members_search_vector_trigger()`
- **Trigger** on `members` (INSERT/UPDATE): `update_search_vector` runs function `members_search_vector_trigger()`
- **Trigger** on `custom_field_values` (INSERT/UPDATE/DELETE): `update_member_search_vector_on_custom_field_value_change` runs function `update_member_search_vector_from_custom_field_value()`
- **Trigger** on `member_groups` (INSERT/UPDATE/DELETE): `update_member_search_vector_on_member_groups_change` runs function `update_member_search_vector_from_member_groups()`
- **Index Type:** GIN (Generalized Inverted Index)
### Weighted Fields
- **Weight A (highest):** first_name, last_name
- **Weight B:** email, notes, group names (from member_groups → groups)
- **Weight C:** city, street, house_number, postal_code, country, custom_field_values
- **Weight C:** city, street, house_number, postal_code, custom_field_values
- **Weight D (lowest):** join_date, exit_date
### Group Names in Search
@ -258,291 +138,46 @@ Custom field values are automatically included in the search vector:
### Usage Example
```sql
SELECT * FROM members
SELECT * FROM members
WHERE search_vector @@ to_tsquery('simple', 'john & doe');
```
## Fuzzy Search (Trigram-based)
### Implementation
- **Extension:** `pg_trgm` (PostgreSQL Trigram)
- **Index Type:** GIN with `gin_trgm_ops` operator class
- **Similarity Threshold:** 0.2 (default, configurable)
- **Added:** November 2025 (PR #187, closes #162)
- **Extension:** `pg_trgm`; GIN indexes with `gin_trgm_ops` on `first_name`, `last_name`, `email`, `city`, `street`, `notes`.
- **Similarity threshold:** 0.2 (default, configurable) — balances precision/recall.
- **Added:** November 2025 (PR #187, closes #162).
### How It Works
Fuzzy search combines multiple search strategies:
1. **Full-text search** - Primary filter using tsvector
2. **Trigram similarity** - `similarity(field, query) > threshold`
3. **Word similarity** - `word_similarity(query, field) > threshold`
4. **Substring matching** - `LIKE` and `ILIKE` for exact substrings
5. **Modulo operator** - `query % field` for quick similarity check
Fuzzy search combines several strategies (applied as an OR-chain alongside full-text and substring matching):
### Indexed Fields for Fuzzy Search
- `first_name` - GIN trigram index
- `last_name` - GIN trigram index
- `email` - GIN trigram index
- `city` - GIN trigram index
- `street` - GIN trigram index
- `notes` - GIN trigram index
1. Full-text search — primary filter via tsvector.
2. Trigram similarity — `similarity(field, query) > threshold`.
3. Word similarity — `word_similarity(query, field) > threshold`.
4. Substring matching — `LIKE` / `ILIKE`.
5. `%` operator — quick trigram-similarity check.
### Usage Example (Ash Action)
```elixir
# In LiveView or context
Member.fuzzy_search(Member, query: "john", similarity_threshold: 0.2)
# Or using Ash Query directly
Member
|> Ash.Query.for_read(:search, %{query: "john", similarity_threshold: 0.2})
|> Mv.Membership.read!()
```
### Usage Example (SQL)
```sql
-- Trigram similarity search
SELECT * FROM members
WHERE similarity(first_name, 'john') > 0.2
OR similarity(last_name, 'doe') > 0.2
ORDER BY similarity(first_name, 'john') DESC;
-- Word similarity (better for partial matches)
SELECT * FROM members
WHERE word_similarity('john', first_name) > 0.2;
-- Quick similarity check with % operator
SELECT * FROM members
WHERE 'john' % first_name;
```
### Performance Considerations
- **GIN indexes** speed up trigram operations significantly
- **Similarity threshold** of 0.2 balances precision and recall
- **Combined approach** (FTS + trigram) provides best results
- Lower threshold = more results but less specific
For the Elixir search action and per-strategy filter functions, see `lib/membership/member.ex` and [`custom-fields-search-performance.md`](./custom-fields-search-performance.md).
## Database Extensions
### Required PostgreSQL Extensions
Installed extensions are defined in `Mv.Repo.installed_extensions/0`:
1. **uuid-ossp**
- Purpose: UUID generation functions
- Used for: `gen_random_uuid()`, `uuid_generate_v7()`
| Extension | Purpose | Notes |
|-----------|---------|-------|
| `ash-functions` | Ash helper SQL functions | installed by Ash |
| `citext` | Case-insensitive text | `users.email` |
| `pg_trgm` | Trigram fuzzy search | added in `20251001141005_add_trigram_to_members.exs`; operators `%`, `similarity()`, `word_similarity()` |
2. **citext**
- Purpose: Case-insensitive text type
- Used for: `users.email` (case-insensitive email matching)
`gen_random_uuid()` is built into PostgreSQL; `uuid_generate_v7()` is a custom SQL function defined in a migration (not provided by an extension).
3. **pg_trgm**
- Purpose: Trigram-based fuzzy text search and similarity matching
- Used for: Fuzzy member search with similarity scoring
- Operators: `%` (similarity), `word_similarity()`, `similarity()`
- Added in: Migration `20251001141005_add_trigram_to_members.exs`
## Sensitive Data (GDPR / logging)
### Installation
```sql
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS "citext";
CREATE EXTENSION IF NOT EXISTS "pg_trgm";
```
## Migration Strategy
### Ash Migrations
This project uses Ash Framework's migration system:
```bash
# Generate new migration
mix ash.codegen --name add_new_feature
# Apply migrations
mix ash.setup
# Rollback migrations
mix ash_postgres.rollback -n 1
```
### Migration Files Location
```
priv/repo/migrations/
├── 20250421101957_initialize_extensions_1.exs
├── 20250528163901_initial_migration.exs
├── 20250617090641_member_fields.exs
├── 20250620110850_add_accounts_domain.exs
├── 20250912085235_AddSearchVectorToMembers.exs
├── 20250926180341_add_unique_email_to_members.exs
├── 20251001141005_add_trigram_to_members.exs
└── 20251016130855_add_constraints_for_user_member_and_property.exs
```
## Data Integrity
### Foreign Key Behaviors
| Relationship | On Delete | Rationale |
|--------------|-----------|-----------|
| `users.member_id → members.id` | SET NULL | Preserve user account when member deleted |
| `custom_field_values.member_id → members.id` | CASCADE | Delete custom_field_values with member |
| `custom_field_values.custom_field_id → custom_fields.id` | RESTRICT | Prevent deletion of types in use |
### Validation Layers
1. **Database Level:**
- CHECK constraints
- NOT NULL constraints
- UNIQUE indexes
- Foreign key constraints
2. **Application Level (Ash):**
- Custom validators
- Email format validation (EctoCommons.EmailValidator)
- Business rule validation
- Cross-entity validation
3. **UI Level:**
- Client-side form validation
- Real-time feedback
- Error messages
## Performance Considerations
### Query Patterns
**High Frequency:**
- Member search (uses GIN index on search_vector)
- Member list with filters (uses indexes on join_date, membership_fee_type_id)
- User authentication (uses unique index on email/oidc_id)
- CustomFieldValue lookups by member (uses index on member_id)
**Medium Frequency:**
- Member CRUD operations
- CustomFieldValue updates
- Token validation
**Low Frequency:**
- CustomField management
- User-Member linking
- Bulk operations
### Optimization Tips
1. **Use indexes:** All critical query paths have indexes
2. **Preload relationships:** Use Ash's `load` to avoid N+1
3. **Pagination:** Use keyset pagination (configured by default)
4. **GIN indexes:** Full-text search and fuzzy search on multiple fields
5. **Search optimization:** Full-text search via tsvector, not LIKE
## Visualization
### Using dbdiagram.io
1. Visit [https://dbdiagram.io](https://dbdiagram.io)
2. Click "Import" → "From file"
3. Upload `database_schema.dbml`
4. View interactive diagram with relationships
### Using dbdocs.io
1. Install dbdocs CLI: `npm install -g dbdocs`
2. Generate docs: `dbdocs build database_schema.dbml`
3. View generated documentation
### VS Code Extension
Install "DBML Language" extension to view/edit DBML files with:
- Syntax highlighting
- Inline documentation
- Error checking
## Security Considerations
### Sensitive Data
**Encrypted:**
- `users.hashed_password` (bcrypt)
**Should Not Log:**
- hashed_password
- tokens (jti, purpose, extra_data)
**Personal Data (GDPR):**
- All member fields (name, email, address)
- User email
- Token subject
### Access Control
- Implement through Ash policies
- Row-level security considerations for future
- Audit logging for sensitive operations
## Backup Recommendations
### Critical Tables (Priority 1)
- `members` - Core business data
- `users` - Authentication data
- `custom_fields` - Schema definitions
### Important Tables (Priority 2)
- `custom_field_values` - Member custom data
- `tokens` - Can be regenerated but good to backup
### Backup Strategy
```bash
# Full database backup
pg_dump -Fc mv_prod > backup_$(date +%Y%m%d).dump
# Restore
pg_restore -d mv_prod backup_20251110.dump
```
## Testing
### Test Database
- Separate test database: `mv_test`
- Sandbox mode via Ecto.Adapters.SQL.Sandbox
- Reset between tests
### Seed Data
```bash
# Load seed data
mix run priv/repo/seeds.exs
```
## Future Considerations
### Potential Additions
1. **Audit Log Table**
- Track changes to members
- Compliance and history tracking
2. **Payment Tracking**
- Payment history table
- Transaction records
- Fee calculation
3. **Document Storage**
- Member documents/attachments
- File metadata table
4. **Email Queue**
- Outbound email tracking
- Delivery status
5. **Roles & Permissions**
- User roles (admin, treasurer, member)
- Permission management
## Resources
- **Ash Framework:** [https://hexdocs.pm/ash](https://hexdocs.pm/ash)
- **AshPostgres:** [https://hexdocs.pm/ash_postgres](https://hexdocs.pm/ash_postgres)
- **DBML Specification:** [https://dbml.dbdiagram.io](https://dbml.dbdiagram.io)
- **PostgreSQL Docs:** [https://www.postgresql.org/docs/](https://www.postgresql.org/docs/)
- **Never log:** `users.hashed_password` (bcrypt), token fields (`jti`, `purpose`, `extra_data`), OIDC/SMTP/vereinfacht secrets in `settings`.
- **Personal data:** all member fields, user email, join-request applicant data.
---
**Last Updated:** 2026-01-27
**Schema Version:** 1.5
**Last Updated:** 2026-06-15
**Schema Version:** 1.6 (12 tables)
**Database:** PostgreSQL 17.6 (dev) / 16 (prod)