mitgliederverwaltung/docs/database-schema-readme.md
Moritz 47f18e9ef3
All checks were successful
continuous-integration/drone/push Build is passing
docs: update the docs
2025-11-13 16:56:41 +01:00

464 lines
12 KiB
Markdown

# Database Schema Documentation
## Overview
This document provides a comprehensive overview of the Mila Membership Management System database schema.
## Quick Links
- **DBML File:** [`database_schema.dbml`](./database_schema.dbml)
- **Visualize Online:**
- [dbdiagram.io](https://dbdiagram.io) - Upload the DBML file
- [dbdocs.io](https://dbdocs.io) - Generate interactive documentation
## Schema Statistics
| Metric | Count |
|--------|-------|
| **Tables** | 5 |
| **Domains** | 2 (Accounts, Membership) |
| **Relationships** | 3 |
| **Indexes** | 15+ |
| **Triggers** | 1 (Full-text search) |
## Tables Overview
### Accounts Domain
#### `users`
- **Purpose:** User authentication and session management
- **Rows (Estimated):** Low to Medium (typically 10-50% of members)
- **Key Features:**
- Dual authentication (Password + OIDC)
- Optional 1:1 link to members
- Email as source of truth when linked
#### `tokens`
- **Purpose:** JWT token storage for AshAuthentication
- **Rows (Estimated):** Medium to High (multiple tokens per user)
- **Key Features:**
- Token lifecycle management
- Revocation support
- Multiple token purposes
### Membership Domain
#### `members`
- **Purpose:** Club member master data
- **Rows (Estimated):** High (core entity)
- **Key Features:**
- Complete member profile
- Full-text search via tsvector
- Bidirectional email sync with users
- Flexible address and contact data
#### `properties`
- **Purpose:** Dynamic custom member attributes
- **Rows (Estimated):** Variable (N per member)
- **Key Features:**
- Union type value storage (JSONB)
- Multiple data types supported
- One property per type per member
#### `property_types`
- **Purpose:** Schema definitions for custom properties
- **Rows (Estimated):** Low (admin-defined)
- **Key Features:**
- Type definitions
- Immutable and required flags
- Centralized property management
## Key Relationships
```
User (0..1) ←→ (0..1) Member
Tokens (N)
Member (1) → (N) Properties
PropertyType (1)
```
### Relationship Details
1. **User ↔ Member (Optional 1:1, both sides optional)**
- A User can have 0 or 1 Member (`user.member_id` can be NULL)
- A Member can have 0 or 1 User (optional `has_one` relationship)
- Both entities can exist independently
- Email synchronization when linked (User.email is source of truth)
- `ON DELETE SET NULL` on user side (User preserved when Member deleted)
2. **Member → Properties (1:N)**
- One member, many properties
- `ON DELETE CASCADE` - properties deleted with member
- Composite unique constraint (member_id, property_type_id)
3. **Property → PropertyType (N:1)**
- Properties reference type definition
- `ON DELETE RESTRICT` - cannot delete type if in use
- Type defines data structure
## Important Business Rules
### Email Synchronization
- **User.email** is the source of truth when linked
- On linking: Member.email ← User.email (overwrite)
- After linking: Changes sync bidirectionally
- Validation prevents email conflicts
### Authentication Strategies
- **Password:** Email + hashed_password
- **OIDC:** Email + oidc_id (Rauthy provider)
- At least one method required per user
### Member Constraints
- First name and last name required (min 1 char)
- Email unique, validated format (5-254 chars)
- Birth date cannot be in future
- Join date cannot be in future
- Exit date must be after join date
- Phone: `+?[0-9\- ]{6,20}`
- Postal code: 5 digits
### Property System
- Maximum one property per type per member
- Value stored as union type in JSONB
- Supported types: string, integer, boolean, date, email
- Types can be marked as immutable or required
## Indexes
### Performance Indexes
**members:**
- `search_vector` (GIN) - Full-text search (tsvector)
- `first_name` (GIN trgm) - Fuzzy search on first name
- `last_name` (GIN trgm) - Fuzzy search on last name
- `email` (GIN trgm) - Fuzzy search on email
- `city` (GIN trgm) - Fuzzy search on city
- `street` (GIN trgm) - Fuzzy search on street
- `notes` (GIN trgm) - Fuzzy search on notes
- `email` (B-tree) - Exact email lookups
- `last_name` (B-tree) - Name sorting
- `join_date` (B-tree) - Date filtering
- `paid` (partial B-tree) - Payment status queries
**properties:**
- `member_id` - Member property lookups
- `property_type_id` - Type-based queries
- Composite `(member_id, property_type_id)` - Uniqueness
**tokens:**
- `subject` - User token lookups
- `expires_at` - Token cleanup
- `purpose` - Purpose-based queries
**users:**
- `email` (unique) - Login lookups
- `oidc_id` (unique) - OIDC authentication
- `member_id` (unique) - Member linkage
## Full-Text Search
### Implementation
- **Trigger:** `members_search_vector_trigger()`
- **Function:** Automatically updates `search_vector` on INSERT/UPDATE
- **Index Type:** GIN (Generalized Inverted Index)
### Weighted Fields
- **Weight A (highest):** first_name, last_name
- **Weight B:** email, notes
- **Weight C:** birth_date, phone_number, city, street, house_number, postal_code
- **Weight D (lowest):** join_date, exit_date
### Usage Example
```sql
SELECT * FROM members
WHERE search_vector @@ to_tsquery('simple', 'john & doe');
```
## Fuzzy Search (Trigram-based)
### Implementation
- **Extension:** `pg_trgm` (PostgreSQL Trigram)
- **Index Type:** GIN with `gin_trgm_ops` operator class
- **Similarity Threshold:** 0.2 (default, configurable)
- **Added:** November 2025 (PR #187, closes #162)
### How It Works
Fuzzy search combines multiple search strategies:
1. **Full-text search** - Primary filter using tsvector
2. **Trigram similarity** - `similarity(field, query) > threshold`
3. **Word similarity** - `word_similarity(query, field) > threshold`
4. **Substring matching** - `LIKE` and `ILIKE` for exact substrings
5. **Modulo operator** - `query % field` for quick similarity check
### Indexed Fields for Fuzzy Search
- `first_name` - GIN trigram index
- `last_name` - GIN trigram index
- `email` - GIN trigram index
- `city` - GIN trigram index
- `street` - GIN trigram index
- `notes` - GIN trigram index
### Usage Example (Ash Action)
```elixir
# In LiveView or context
Member.fuzzy_search(Member, query: "john", similarity_threshold: 0.2)
# Or using Ash Query directly
Member
|> Ash.Query.for_read(:search, %{query: "john", similarity_threshold: 0.2})
|> Mv.Membership.read!()
```
### Usage Example (SQL)
```sql
-- Trigram similarity search
SELECT * FROM members
WHERE similarity(first_name, 'john') > 0.2
OR similarity(last_name, 'doe') > 0.2
ORDER BY similarity(first_name, 'john') DESC;
-- Word similarity (better for partial matches)
SELECT * FROM members
WHERE word_similarity('john', first_name) > 0.2;
-- Quick similarity check with % operator
SELECT * FROM members
WHERE 'john' % first_name;
```
### Performance Considerations
- **GIN indexes** speed up trigram operations significantly
- **Similarity threshold** of 0.2 balances precision and recall
- **Combined approach** (FTS + trigram) provides best results
- Lower threshold = more results but less specific
## Database Extensions
### Required PostgreSQL Extensions
1. **uuid-ossp**
- Purpose: UUID generation functions
- Used for: `gen_random_uuid()`, `uuid_generate_v7()`
2. **citext**
- Purpose: Case-insensitive text type
- Used for: `users.email` (case-insensitive email matching)
3. **pg_trgm**
- Purpose: Trigram-based fuzzy text search and similarity matching
- Used for: Fuzzy member search with similarity scoring
- Operators: `%` (similarity), `word_similarity()`, `similarity()`
- Added in: Migration `20251001141005_add_trigram_to_members.exs`
### Installation
```sql
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS "citext";
CREATE EXTENSION IF NOT EXISTS "pg_trgm";
```
## Migration Strategy
### Ash Migrations
This project uses Ash Framework's migration system:
```bash
# Generate new migration
mix ash.codegen --name add_new_feature
# Apply migrations
mix ash.setup
# Rollback migrations
mix ash_postgres.rollback -n 1
```
### Migration Files Location
```
priv/repo/migrations/
├── 20250421101957_initialize_extensions_1.exs
├── 20250528163901_initial_migration.exs
├── 20250617090641_member_fields.exs
├── 20250620110850_add_accounts_domain.exs
├── 20250912085235_AddSearchVectorToMembers.exs
├── 20250926180341_add_unique_email_to_members.exs
├── 20251001141005_add_trigram_to_members.exs
└── 20251016130855_add_constraints_for_user_member_and_property.exs
```
## Data Integrity
### Foreign Key Behaviors
| Relationship | On Delete | Rationale |
|--------------|-----------|-----------|
| `users.member_id → members.id` | SET NULL | Preserve user account when member deleted |
| `properties.member_id → members.id` | CASCADE | Delete properties with member |
| `properties.property_type_id → property_types.id` | RESTRICT | Prevent deletion of types in use |
### Validation Layers
1. **Database Level:**
- CHECK constraints
- NOT NULL constraints
- UNIQUE indexes
- Foreign key constraints
2. **Application Level (Ash):**
- Custom validators
- Email format validation (EctoCommons.EmailValidator)
- Business rule validation
- Cross-entity validation
3. **UI Level:**
- Client-side form validation
- Real-time feedback
- Error messages
## Performance Considerations
### Query Patterns
**High Frequency:**
- Member search (uses GIN index on search_vector)
- Member list with filters (uses indexes on join_date, paid)
- User authentication (uses unique index on email/oidc_id)
- Property lookups by member (uses index on member_id)
**Medium Frequency:**
- Member CRUD operations
- Property updates
- Token validation
**Low Frequency:**
- PropertyType management
- User-Member linking
- Bulk operations
### Optimization Tips
1. **Use indexes:** All critical query paths have indexes
2. **Preload relationships:** Use Ash's `load` to avoid N+1
3. **Pagination:** Use keyset pagination (configured by default)
4. **Partial indexes:** `members.paid` index only non-NULL values
5. **Search optimization:** Full-text search via tsvector, not LIKE
## Visualization
### Using dbdiagram.io
1. Visit [https://dbdiagram.io](https://dbdiagram.io)
2. Click "Import" → "From file"
3. Upload `database_schema.dbml`
4. View interactive diagram with relationships
### Using dbdocs.io
1. Install dbdocs CLI: `npm install -g dbdocs`
2. Generate docs: `dbdocs build database_schema.dbml`
3. View generated documentation
### VS Code Extension
Install "DBML Language" extension to view/edit DBML files with:
- Syntax highlighting
- Inline documentation
- Error checking
## Security Considerations
### Sensitive Data
**Encrypted:**
- `users.hashed_password` (bcrypt)
**Should Not Log:**
- hashed_password
- tokens (jti, purpose, extra_data)
**Personal Data (GDPR):**
- All member fields (name, email, birth_date, address)
- User email
- Token subject
### Access Control
- Implement through Ash policies
- Row-level security considerations for future
- Audit logging for sensitive operations
## Backup Recommendations
### Critical Tables (Priority 1)
- `members` - Core business data
- `users` - Authentication data
- `property_types` - Schema definitions
### Important Tables (Priority 2)
- `properties` - Member custom data
- `tokens` - Can be regenerated but good to backup
### Backup Strategy
```bash
# Full database backup
pg_dump -Fc mv_prod > backup_$(date +%Y%m%d).dump
# Restore
pg_restore -d mv_prod backup_20251110.dump
```
## Testing
### Test Database
- Separate test database: `mv_test`
- Sandbox mode via Ecto.Adapters.SQL.Sandbox
- Reset between tests
### Seed Data
```bash
# Load seed data
mix run priv/repo/seeds.exs
```
## Future Considerations
### Potential Additions
1. **Audit Log Table**
- Track changes to members
- Compliance and history tracking
2. **Payment Tracking**
- Payment history table
- Transaction records
- Fee calculation
3. **Document Storage**
- Member documents/attachments
- File metadata table
4. **Email Queue**
- Outbound email tracking
- Delivery status
5. **Roles & Permissions**
- User roles (admin, treasurer, member)
- Permission management
## Resources
- **Ash Framework:** [https://hexdocs.pm/ash](https://hexdocs.pm/ash)
- **AshPostgres:** [https://hexdocs.pm/ash_postgres](https://hexdocs.pm/ash_postgres)
- **DBML Specification:** [https://dbml.dbdiagram.io](https://dbml.dbdiagram.io)
- **PostgreSQL Docs:** [https://www.postgresql.org/docs/](https://www.postgresql.org/docs/)
---
**Last Updated:** 2025-11-13
**Schema Version:** 1.1
**Database:** PostgreSQL 17.6 (dev) / 16 (prod)