mitgliederverwaltung/docs/database-schema-readme.md
Moritz 47f18e9ef3
All checks were successful
continuous-integration/drone/push Build is passing
docs: update the docs
2025-11-13 16:56:41 +01:00

12 KiB

Database Schema Documentation

Overview

This document provides a comprehensive overview of the Mila Membership Management System database schema.

Schema Statistics

Metric Count
Tables 5
Domains 2 (Accounts, Membership)
Relationships 3
Indexes 15+
Triggers 1 (Full-text search)

Tables Overview

Accounts Domain

users

  • Purpose: User authentication and session management
  • Rows (Estimated): Low to Medium (typically 10-50% of members)
  • Key Features:
    • Dual authentication (Password + OIDC)
    • Optional 1:1 link to members
    • Email as source of truth when linked

tokens

  • Purpose: JWT token storage for AshAuthentication
  • Rows (Estimated): Medium to High (multiple tokens per user)
  • Key Features:
    • Token lifecycle management
    • Revocation support
    • Multiple token purposes

Membership Domain

members

  • Purpose: Club member master data
  • Rows (Estimated): High (core entity)
  • Key Features:
    • Complete member profile
    • Full-text search via tsvector
    • Bidirectional email sync with users
    • Flexible address and contact data

properties

  • Purpose: Dynamic custom member attributes
  • Rows (Estimated): Variable (N per member)
  • Key Features:
    • Union type value storage (JSONB)
    • Multiple data types supported
    • One property per type per member

property_types

  • Purpose: Schema definitions for custom properties
  • Rows (Estimated): Low (admin-defined)
  • Key Features:
    • Type definitions
    • Immutable and required flags
    • Centralized property management

Key Relationships

User (0..1) ←→ (0..1) Member
       ↓
    Tokens (N)

Member (1) → (N) Properties
                    ↓
              PropertyType (1)

Relationship Details

  1. User ↔ Member (Optional 1:1, both sides optional)

    • A User can have 0 or 1 Member (user.member_id can be NULL)
    • A Member can have 0 or 1 User (optional has_one relationship)
    • Both entities can exist independently
    • Email synchronization when linked (User.email is source of truth)
    • ON DELETE SET NULL on user side (User preserved when Member deleted)
  2. Member → Properties (1:N)

    • One member, many properties
    • ON DELETE CASCADE - properties deleted with member
    • Composite unique constraint (member_id, property_type_id)
  3. Property → PropertyType (N:1)

    • Properties reference type definition
    • ON DELETE RESTRICT - cannot delete type if in use
    • Type defines data structure

Important Business Rules

Email Synchronization

  • User.email is the source of truth when linked
  • On linking: Member.email ← User.email (overwrite)
  • After linking: Changes sync bidirectionally
  • Validation prevents email conflicts

Authentication Strategies

  • Password: Email + hashed_password
  • OIDC: Email + oidc_id (Rauthy provider)
  • At least one method required per user

Member Constraints

  • First name and last name required (min 1 char)
  • Email unique, validated format (5-254 chars)
  • Birth date cannot be in future
  • Join date cannot be in future
  • Exit date must be after join date
  • Phone: +?[0-9\- ]{6,20}
  • Postal code: 5 digits

Property System

  • Maximum one property per type per member
  • Value stored as union type in JSONB
  • Supported types: string, integer, boolean, date, email
  • Types can be marked as immutable or required

Indexes

Performance Indexes

members:

  • search_vector (GIN) - Full-text search (tsvector)
  • first_name (GIN trgm) - Fuzzy search on first name
  • last_name (GIN trgm) - Fuzzy search on last name
  • email (GIN trgm) - Fuzzy search on email
  • city (GIN trgm) - Fuzzy search on city
  • street (GIN trgm) - Fuzzy search on street
  • notes (GIN trgm) - Fuzzy search on notes
  • email (B-tree) - Exact email lookups
  • last_name (B-tree) - Name sorting
  • join_date (B-tree) - Date filtering
  • paid (partial B-tree) - Payment status queries

properties:

  • member_id - Member property lookups
  • property_type_id - Type-based queries
  • Composite (member_id, property_type_id) - Uniqueness

tokens:

  • subject - User token lookups
  • expires_at - Token cleanup
  • purpose - Purpose-based queries

users:

  • email (unique) - Login lookups
  • oidc_id (unique) - OIDC authentication
  • member_id (unique) - Member linkage

Implementation

  • Trigger: members_search_vector_trigger()
  • Function: Automatically updates search_vector on INSERT/UPDATE
  • Index Type: GIN (Generalized Inverted Index)

Weighted Fields

  • Weight A (highest): first_name, last_name
  • Weight B: email, notes
  • Weight C: birth_date, phone_number, city, street, house_number, postal_code
  • Weight D (lowest): join_date, exit_date

Usage Example

SELECT * FROM members 
WHERE search_vector @@ to_tsquery('simple', 'john & doe');

Fuzzy Search (Trigram-based)

Implementation

  • Extension: pg_trgm (PostgreSQL Trigram)
  • Index Type: GIN with gin_trgm_ops operator class
  • Similarity Threshold: 0.2 (default, configurable)
  • Added: November 2025 (PR #187, closes #162)

How It Works

Fuzzy search combines multiple search strategies:

  1. Full-text search - Primary filter using tsvector
  2. Trigram similarity - similarity(field, query) > threshold
  3. Word similarity - word_similarity(query, field) > threshold
  4. Substring matching - LIKE and ILIKE for exact substrings
  5. Modulo operator - query % field for quick similarity check
  • first_name - GIN trigram index
  • last_name - GIN trigram index
  • email - GIN trigram index
  • city - GIN trigram index
  • street - GIN trigram index
  • notes - GIN trigram index

Usage Example (Ash Action)

# In LiveView or context
Member.fuzzy_search(Member, query: "john", similarity_threshold: 0.2)

# Or using Ash Query directly
Member
|> Ash.Query.for_read(:search, %{query: "john", similarity_threshold: 0.2})
|> Mv.Membership.read!()

Usage Example (SQL)

-- Trigram similarity search
SELECT * FROM members 
WHERE similarity(first_name, 'john') > 0.2
   OR similarity(last_name, 'doe') > 0.2
ORDER BY similarity(first_name, 'john') DESC;

-- Word similarity (better for partial matches)
SELECT * FROM members 
WHERE word_similarity('john', first_name) > 0.2;

-- Quick similarity check with % operator
SELECT * FROM members 
WHERE 'john' % first_name;

Performance Considerations

  • GIN indexes speed up trigram operations significantly
  • Similarity threshold of 0.2 balances precision and recall
  • Combined approach (FTS + trigram) provides best results
  • Lower threshold = more results but less specific

Database Extensions

Required PostgreSQL Extensions

  1. uuid-ossp

    • Purpose: UUID generation functions
    • Used for: gen_random_uuid(), uuid_generate_v7()
  2. citext

    • Purpose: Case-insensitive text type
    • Used for: users.email (case-insensitive email matching)
  3. pg_trgm

    • Purpose: Trigram-based fuzzy text search and similarity matching
    • Used for: Fuzzy member search with similarity scoring
    • Operators: % (similarity), word_similarity(), similarity()
    • Added in: Migration 20251001141005_add_trigram_to_members.exs

Installation

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE EXTENSION IF NOT EXISTS "citext";
CREATE EXTENSION IF NOT EXISTS "pg_trgm";

Migration Strategy

Ash Migrations

This project uses Ash Framework's migration system:

# Generate new migration
mix ash.codegen --name add_new_feature

# Apply migrations
mix ash.setup

# Rollback migrations
mix ash_postgres.rollback -n 1

Migration Files Location

priv/repo/migrations/
├── 20250421101957_initialize_extensions_1.exs
├── 20250528163901_initial_migration.exs
├── 20250617090641_member_fields.exs
├── 20250620110850_add_accounts_domain.exs
├── 20250912085235_AddSearchVectorToMembers.exs
├── 20250926180341_add_unique_email_to_members.exs
├── 20251001141005_add_trigram_to_members.exs
└── 20251016130855_add_constraints_for_user_member_and_property.exs

Data Integrity

Foreign Key Behaviors

Relationship On Delete Rationale
users.member_id → members.id SET NULL Preserve user account when member deleted
properties.member_id → members.id CASCADE Delete properties with member
properties.property_type_id → property_types.id RESTRICT Prevent deletion of types in use

Validation Layers

  1. Database Level:

    • CHECK constraints
    • NOT NULL constraints
    • UNIQUE indexes
    • Foreign key constraints
  2. Application Level (Ash):

    • Custom validators
    • Email format validation (EctoCommons.EmailValidator)
    • Business rule validation
    • Cross-entity validation
  3. UI Level:

    • Client-side form validation
    • Real-time feedback
    • Error messages

Performance Considerations

Query Patterns

High Frequency:

  • Member search (uses GIN index on search_vector)
  • Member list with filters (uses indexes on join_date, paid)
  • User authentication (uses unique index on email/oidc_id)
  • Property lookups by member (uses index on member_id)

Medium Frequency:

  • Member CRUD operations
  • Property updates
  • Token validation

Low Frequency:

  • PropertyType management
  • User-Member linking
  • Bulk operations

Optimization Tips

  1. Use indexes: All critical query paths have indexes
  2. Preload relationships: Use Ash's load to avoid N+1
  3. Pagination: Use keyset pagination (configured by default)
  4. Partial indexes: members.paid index only non-NULL values
  5. Search optimization: Full-text search via tsvector, not LIKE

Visualization

Using dbdiagram.io

  1. Visit https://dbdiagram.io
  2. Click "Import" → "From file"
  3. Upload database_schema.dbml
  4. View interactive diagram with relationships

Using dbdocs.io

  1. Install dbdocs CLI: npm install -g dbdocs
  2. Generate docs: dbdocs build database_schema.dbml
  3. View generated documentation

VS Code Extension

Install "DBML Language" extension to view/edit DBML files with:

  • Syntax highlighting
  • Inline documentation
  • Error checking

Security Considerations

Sensitive Data

Encrypted:

  • users.hashed_password (bcrypt)

Should Not Log:

  • hashed_password
  • tokens (jti, purpose, extra_data)

Personal Data (GDPR):

  • All member fields (name, email, birth_date, address)
  • User email
  • Token subject

Access Control

  • Implement through Ash policies
  • Row-level security considerations for future
  • Audit logging for sensitive operations

Backup Recommendations

Critical Tables (Priority 1)

  • members - Core business data
  • users - Authentication data
  • property_types - Schema definitions

Important Tables (Priority 2)

  • properties - Member custom data
  • tokens - Can be regenerated but good to backup

Backup Strategy

# Full database backup
pg_dump -Fc mv_prod > backup_$(date +%Y%m%d).dump

# Restore
pg_restore -d mv_prod backup_20251110.dump

Testing

Test Database

  • Separate test database: mv_test
  • Sandbox mode via Ecto.Adapters.SQL.Sandbox
  • Reset between tests

Seed Data

# Load seed data
mix run priv/repo/seeds.exs

Future Considerations

Potential Additions

  1. Audit Log Table

    • Track changes to members
    • Compliance and history tracking
  2. Payment Tracking

    • Payment history table
    • Transaction records
    • Fee calculation
  3. Document Storage

    • Member documents/attachments
    • File metadata table
  4. Email Queue

    • Outbound email tracking
    • Delivery status
  5. Roles & Permissions

    • User roles (admin, treasurer, member)
    • Permission management

Resources


Last Updated: 2025-11-13
Schema Version: 1.1
Database: PostgreSQL 17.6 (dev) / 16 (prod)