diff --git a/docs/database-schema-readme.md b/docs/database-schema-readme.md index ab1d76d..eefb608 100644 --- a/docs/database-schema-readme.md +++ b/docs/database-schema-readme.md @@ -132,11 +132,17 @@ Member (1) → (N) Properties ### Performance Indexes **members:** -- `search_vector` (GIN) - Full-text search -- `email` - Email lookups -- `last_name` - Name sorting -- `join_date` - Date filtering -- `paid` (partial) - Payment status queries +- `search_vector` (GIN) - Full-text search (tsvector) +- `first_name` (GIN trgm) - Fuzzy search on first name +- `last_name` (GIN trgm) - Fuzzy search on last name +- `email` (GIN trgm) - Fuzzy search on email +- `city` (GIN trgm) - Fuzzy search on city +- `street` (GIN trgm) - Fuzzy search on street +- `notes` (GIN trgm) - Fuzzy search on notes +- `email` (B-tree) - Exact email lookups +- `last_name` (B-tree) - Name sorting +- `join_date` (B-tree) - Date filtering +- `paid` (partial B-tree) - Payment status queries **properties:** - `member_id` - Member property lookups @@ -172,6 +178,64 @@ SELECT * FROM members WHERE search_vector @@ to_tsquery('simple', 'john & doe'); ``` +## Fuzzy Search (Trigram-based) + +### Implementation +- **Extension:** `pg_trgm` (PostgreSQL Trigram) +- **Index Type:** GIN with `gin_trgm_ops` operator class +- **Similarity Threshold:** 0.2 (default, configurable) +- **Added:** November 2025 (PR #187, closes #162) + +### How It Works +Fuzzy search combines multiple search strategies: +1. **Full-text search** - Primary filter using tsvector +2. **Trigram similarity** - `similarity(field, query) > threshold` +3. **Word similarity** - `word_similarity(query, field) > threshold` +4. **Substring matching** - `LIKE` and `ILIKE` for exact substrings +5. **Modulo operator** - `query % field` for quick similarity check + +### Indexed Fields for Fuzzy Search +- `first_name` - GIN trigram index +- `last_name` - GIN trigram index +- `email` - GIN trigram index +- `city` - GIN trigram index +- `street` - GIN trigram index +- `notes` - GIN trigram index + +### Usage Example (Ash Action) +```elixir +# In LiveView or context +Member.fuzzy_search(Member, query: "john", similarity_threshold: 0.2) + +# Or using Ash Query directly +Member +|> Ash.Query.for_read(:search, %{query: "john", similarity_threshold: 0.2}) +|> Mv.Membership.read!() +``` + +### Usage Example (SQL) +```sql +-- Trigram similarity search +SELECT * FROM members +WHERE similarity(first_name, 'john') > 0.2 + OR similarity(last_name, 'doe') > 0.2 +ORDER BY similarity(first_name, 'john') DESC; + +-- Word similarity (better for partial matches) +SELECT * FROM members +WHERE word_similarity('john', first_name) > 0.2; + +-- Quick similarity check with % operator +SELECT * FROM members +WHERE 'john' % first_name; +``` + +### Performance Considerations +- **GIN indexes** speed up trigram operations significantly +- **Similarity threshold** of 0.2 balances precision and recall +- **Combined approach** (FTS + trigram) provides best results +- Lower threshold = more results but less specific + ## Database Extensions ### Required PostgreSQL Extensions @@ -184,10 +248,17 @@ WHERE search_vector @@ to_tsquery('simple', 'john & doe'); - Purpose: Case-insensitive text type - Used for: `users.email` (case-insensitive email matching) +3. **pg_trgm** + - Purpose: Trigram-based fuzzy text search and similarity matching + - Used for: Fuzzy member search with similarity scoring + - Operators: `%` (similarity), `word_similarity()`, `similarity()` + - Added in: Migration `20251001141005_add_trigram_to_members.exs` + ### Installation ```sql CREATE EXTENSION IF NOT EXISTS "uuid-ossp"; CREATE EXTENSION IF NOT EXISTS "citext"; +CREATE EXTENSION IF NOT EXISTS "pg_trgm"; ``` ## Migration Strategy @@ -215,6 +286,7 @@ priv/repo/migrations/ ├── 20250620110850_add_accounts_domain.exs ├── 20250912085235_AddSearchVectorToMembers.exs ├── 20250926180341_add_unique_email_to_members.exs +├── 20251001141005_add_trigram_to_members.exs └── 20251016130855_add_constraints_for_user_member_and_property.exs ``` @@ -386,7 +458,7 @@ mix run priv/repo/seeds.exs --- -**Last Updated:** 2025-11-10 -**Schema Version:** 1.0 +**Last Updated:** 2025-11-13 +**Schema Version:** 1.1 **Database:** PostgreSQL 17.6 (dev) / 16 (prod) diff --git a/docs/database_schema.dbml b/docs/database_schema.dbml index a536d26..b414cf9 100644 --- a/docs/database_schema.dbml +++ b/docs/database_schema.dbml @@ -6,8 +6,8 @@ // - https://dbdocs.io // - VS Code Extensions: "DBML Language" or "dbdiagram.io" // -// Version: 1.0 -// Last Updated: 2025-11-10 +// Version: 1.1 +// Last Updated: 2025-11-13 Project mila_membership_management { database_type: 'PostgreSQL' @@ -17,15 +17,21 @@ Project mila_membership_management { A membership management application for small to mid-sized clubs. ## Key Features: - - User authentication (OIDC + Password) + - User authentication (OIDC + Password with secure account linking) - Member management with flexible custom properties - Bidirectional email synchronization between users and members - - Full-text search capabilities + - Full-text search capabilities (tsvector) + - Fuzzy search with trigram matching (pg_trgm) - GDPR-compliant data management ## Domains: - **Accounts**: User authentication and session management - **Membership**: Club member data and custom properties + + ## Required PostgreSQL Extensions: + - uuid-ossp (UUID generation) + - citext (case-insensitive text) + - pg_trgm (trigram-based fuzzy search) ''' } @@ -130,10 +136,16 @@ Table members { indexes { email [unique, name: 'members_unique_email_index'] - search_vector [type: gin, name: 'members_search_vector_idx', note: 'GIN index for full-text search'] - email [name: 'members_email_idx'] - last_name [name: 'members_last_name_idx', note: 'For name sorting'] - join_date [name: 'members_join_date_idx', note: 'For date filters'] + search_vector [type: gin, name: 'members_search_vector_idx', note: 'GIN index for full-text search (tsvector)'] + first_name [type: gin, name: 'members_first_name_trgm_idx', note: 'GIN trigram index for fuzzy search'] + last_name [type: gin, name: 'members_last_name_trgm_idx', note: 'GIN trigram index for fuzzy search'] + email [type: gin, name: 'members_email_trgm_idx', note: 'GIN trigram index for fuzzy search'] + city [type: gin, name: 'members_city_trgm_idx', note: 'GIN trigram index for fuzzy search'] + street [type: gin, name: 'members_street_trgm_idx', note: 'GIN trigram index for fuzzy search'] + notes [type: gin, name: 'members_notes_trgm_idx', note: 'GIN trigram index for fuzzy search'] + email [name: 'members_email_idx', note: 'B-tree index for exact lookups'] + last_name [name: 'members_last_name_idx', note: 'B-tree index for name sorting'] + join_date [name: 'members_join_date_idx', note: 'B-tree index for date filters'] (paid) [name: 'members_paid_idx', type: btree, note: 'Partial index WHERE paid IS NOT NULL'] } @@ -152,10 +164,17 @@ Table members { - Subsequent changes to either email sync bidirectionally - Validates that email is not already used by another unlinked user - **Full-Text Search:** - - `search_vector` is auto-updated via trigger - - Weighted fields: first_name (A), last_name (A), email (B), notes (B) - - Supports flexible member search across multiple fields + **Search Capabilities:** + 1. Full-Text Search (tsvector): + - `search_vector` is auto-updated via trigger + - Weighted fields: first_name (A), last_name (A), email (B), notes (B) + - GIN index for fast text search + + 2. Fuzzy Search (pg_trgm): + - Trigram-based similarity matching + - 6 GIN trigram indexes on searchable fields + - Configurable similarity threshold (default 0.2) + - Supports typos and partial matches **Relationships:** - Optional 1:1 with users (0..1 ↔ 0..1) - authentication account diff --git a/docs/development-progress-log.md b/docs/development-progress-log.md index aa3795b..0022631 100644 --- a/docs/development-progress-log.md +++ b/docs/development-progress-log.md @@ -227,6 +227,108 @@ attribute :search_vector, AshPostgres.Tsvector, --- +#### Phase 6: Search Enhancement & OIDC Improvements (Sprint 9) + +**Sprint 9 - 01.11 - 13.11 (finalized)** + +**PR #187:** *Implement fuzzy search* (closes #162) 🔍 +- PostgreSQL `pg_trgm` extension for trigram-based fuzzy search +- 6 new GIN trigram indexes on members table: + - first_name, last_name, email, city, street, notes +- Combined search strategy: Full-text (tsvector) + Trigram similarity +- Configurable similarity threshold (default 0.2) +- Migration: `20251001141005_add_trigram_to_members.exs` +- 443 lines of comprehensive tests + +**Key learnings:** +- Trigram indexes significantly improve fuzzy matching +- Combined FTS + trigram provides best user experience +- word_similarity() better for partial word matching than similarity() +- Similarity threshold of 0.2 balances precision and recall + +**Implementation highlights:** +```elixir +# New Ash action: :search with fuzzy matching +read :search do + argument :query, :string, allow_nil?: true + argument :similarity_threshold, :float, allow_nil?: true + # Uses fragment() for pg_trgm operators: %, similarity(), word_similarity() +end + +# Public function for LiveView usage +def fuzzy_search(query, opts) do + Ash.Query.for_read(query, :search, %{query: query_string}) +end +``` + +--- + +**PR #192:** *OIDC handling and linking* (closes #171) 🔐 +- Secure OIDC account linking with password verification +- Security fix: Filter OIDC sign-in by `oidc_id` instead of email +- New custom error: `PasswordVerificationRequired` +- New validation: `OidcEmailCollision` for email conflict detection +- New LiveView: `LinkOidcAccountLive` for interactive linking +- Automatic linking for passwordless users (no password prompt) +- Password verification required for password-protected accounts +- Comprehensive security logging for audit trail +- Locale persistence via secure cookie (1 year TTL) +- Documentation: `docs/oidc-account-linking.md` + +**Security improvements:** +- Prevents account takeover via OIDC email matching +- Password verification before linking OIDC to password accounts +- All linking attempts logged with appropriate severity +- CSRF protection on linking forms +- Secure cookie flags: `http_only`, `secure`, `same_site: "Lax"` + +**Test coverage:** +- 5 new comprehensive test files (1,793 lines total): + - `user_authentication_test.exs` (265 lines) + - `oidc_e2e_flow_test.exs` (415 lines) + - `oidc_email_update_test.exs` (271 lines) + - `oidc_password_linking_test.exs` (496 lines) + - `oidc_passwordless_linking_test.exs` (210 lines) +- Extended `oidc_integration_test.exs` (+136 lines) + +**Key learnings:** +- Account linking requires careful security considerations +- Passwordless users should be auto-linked (better UX) +- Audit logging essential for security-critical operations +- Locale persistence improves user experience post-logout + +--- + +**PR #193:** *Docs, Code Guidelines and Progress Log* 📚 +- Complete project documentation suite (5,554 lines) +- New documentation files: + - `CODE_GUIDELINES.md` (2,578 lines) - Comprehensive development guidelines + - `docs/database-schema-readme.md` (392 lines) - Database documentation + - `docs/database_schema.dbml` (329 lines) - DBML schema definition + - `docs/development-progress-log.md` (1,227 lines) - This file + - `docs/feature-roadmap.md` (743 lines) - Feature planning and roadmap +- Reduced redundancy in README.md (links to detailed docs) +- Cross-referenced documentation for easy navigation + +--- + +**PR #201:** *Code documentation and refactoring* 🔧 +- @moduledoc for ALL modules (51 modules documented) +- @doc for all public functions +- Enabled Credo `ModuleDoc` check (enforces documentation standards) +- Refactored complex functions: + - `MemberLive.Index.handle_event/3` - Split sorting logic into smaller functions + - `AuthController.handle_auth_failure/2` - Reduced cyclomatic complexity +- Documentation coverage: 100% for core modules + +**Key learnings:** +- @moduledoc enforcement improves code maintainability +- Refactoring complex functions improves readability +- Documentation should explain "why" not just "what" +- Credo helps maintain consistent code quality + +--- + ## Implementation Decisions ### Architecture Patterns @@ -369,9 +471,11 @@ end - ✅ Consistent styling - ✅ Mobile-responsive out of the box -#### 7. Full-Text Search Implementation +#### 7. Search Implementation (Full-Text + Fuzzy) -**PostgreSQL tsvector + GIN Index** +**Two-Tiered Search Strategy:** + +**A) Full-Text Search (tsvector + GIN Index)** ```sql -- Auto-updating trigger @@ -389,16 +493,40 @@ END $$ LANGUAGE plpgsql; ``` +**B) Fuzzy Search (pg_trgm + Trigram GIN Indexes)** + +Added November 2025 (PR #187): + +```elixir +# Ash action combining FTS + trigram similarity +read :search do + argument :query, :string + argument :similarity_threshold, :float + + prepare fn query, _ctx -> + # 1. Full-text search (tsvector) + # 2. Trigram similarity (%, similarity(), word_similarity()) + # 3. Substring matching (contains, ilike) + end +end +``` + +**6 Trigram Indexes:** +- first_name, last_name, email, city, street, notes +- GIN index with `gin_trgm_ops` operator class + **Reasoning:** -- Native PostgreSQL feature (no external service) -- Fast with GIN index -- Weighted fields (names more important than dates) +- Native PostgreSQL features (no external service) +- Combined approach handles typos + partial matches +- Fast with GIN indexes - Simple lexer (no German stemming initially) +- Similarity threshold configurable (default 0.2) **Why not Elasticsearch/Meilisearch?** - Overkill for small to mid-sized clubs - Additional infrastructure complexity -- PostgreSQL full-text sufficient for 10k+ members +- PostgreSQL full-text + fuzzy sufficient for 10k+ members +- Better integration with existing stack ### Deviations from Initial Plans @@ -470,7 +598,8 @@ end 3. `20250620110850_add_accounts_domain.exs` - Users & tokens tables 4. `20250912085235_AddSearchVectorToMembers.exs` - Full-text search (tsvector + GIN index) 5. `20250926164519_member_relation.exs` - User-Member link (optional 1:1) -6. `20251016130855_add_constraints_for_user_member_and_property.exs` - Email sync constraints +6. `20251001141005_add_trigram_to_members.exs` - Fuzzy search (pg_trgm + 6 GIN trigram indexes) +7. `20251016130855_add_constraints_for_user_member_and_property.exs` - Email sync constraints **Learning:** Ash's code generation from resources ensures schema always matches code. @@ -1220,8 +1349,8 @@ This project demonstrates a modern Phoenix application built with: --- -**Document Version:** 1.0 -**Last Updated:** 2025-11-10 +**Document Version:** 1.1 +**Last Updated:** 2025-11-13 **Maintainer:** Development Team **Status:** Living Document (update as project evolves) diff --git a/docs/feature-roadmap.md b/docs/feature-roadmap.md index 4768089..5ffd980 100644 --- a/docs/feature-roadmap.md +++ b/docs/feature-roadmap.md @@ -26,9 +26,14 @@ - ✅ Password-based authentication - ✅ User sessions and tokens - ✅ Basic authentication flows +- ✅ **OIDC account linking with password verification** (PR #192, closes #171) +- ✅ **Secure OIDC email collision handling** (PR #192) +- ✅ **Automatic linking for passwordless users** (PR #192) + +**Closed Issues:** +- ✅ [#171](https://git.local-it.org/local-it/mitgliederverwaltung/issues/171) - OIDC handling and linking (closed 2025-11-13) **Open Issues:** -- [#171](https://git.local-it.org/local-it/mitgliederverwaltung/issues/171) - Ensure correct handling of Password login vs OIDC login (M) - [#146](https://git.local-it.org/local-it/mitgliederverwaltung/issues/146) - Translate "or" in the login screen (Low) - [#144](https://git.local-it.org/local-it/mitgliederverwaltung/issues/144) - Add language switch dropdown to login screen (Low) @@ -54,20 +59,24 @@ - ✅ Address management - ✅ Membership status tracking - ✅ Full-text search (PostgreSQL tsvector) +- ✅ **Fuzzy search with trigram matching** (PR #187, closes #162) +- ✅ **Combined FTS + trigram search** (PR #187) +- ✅ **6 GIN trigram indexes** for fuzzy matching (PR #187) - ✅ Sorting by basic fields - ✅ User-Member linking (optional 1:1) - ✅ Email synchronization between User and Member +**Closed Issues:** +- ✅ [#162](https://git.local-it.org/local-it/mitgliederverwaltung/issues/162) - Fuzzy and substring search (closed 2025-11-12) + **Open Issues:** - [#169](https://git.local-it.org/local-it/mitgliederverwaltung/issues/169) - Allow combined creation of Users/Members (M, Low priority) - [#168](https://git.local-it.org/local-it/mitgliederverwaltung/issues/168) - Allow user-member association in edit/create views (M, High priority) - [#165](https://git.local-it.org/local-it/mitgliederverwaltung/issues/165) - Pagination for list of members (S, Low priority) -- [#162](https://git.local-it.org/local-it/mitgliederverwaltung/issues/162) - Implement fuzzy and substring search (M, Medium priority) - [#160](https://git.local-it.org/local-it/mitgliederverwaltung/issues/160) - Implement clear icon in searchbar (S, Low priority) - [#154](https://git.local-it.org/local-it/mitgliederverwaltung/issues/154) - Concept advanced search (Low priority, needs refinement) **Missing Features:** -- ❌ Fuzzy search - ❌ Advanced filters (date ranges, multiple criteria) - ❌ Pagination (currently all members loaded) - ❌ Bulk operations (bulk delete, bulk update) @@ -367,8 +376,8 @@ | Feature Area | Current Status | Priority | Complexity | |--------------|----------------|----------|------------| -| **Authentication & Authorization** | 40% complete | **High** | Medium | -| **Member Management** | 70% complete | **High** | Low-Medium | +| **Authentication & Authorization** | 60% complete | **High** | Medium | +| **Member Management** | 85% complete | **High** | Low-Medium | | **Custom Fields** | 50% complete | **High** | Medium | | **User Management** | 60% complete | Medium | Low | | **Navigation & UX** | 50% complete | Medium | Low | @@ -388,12 +397,12 @@ ### Open Milestones (From Issues) 1. ✅ **Ich kann einen neuen Kontakt anlegen** (Closed) -2. 🔄 **I can search through the list of members - fulltext** (Open) - Related: #162, #154 +2. ✅ **I can search through the list of members - fulltext** (Closed) - #162 implemented (Fuzzy Search), #154 needs refinement 3. 🔄 **I can sort the list of members for specific fields** (Open) - Related: #153 4. 🔄 **We have a intuitive navigation structure** (Open) 5. 🔄 **We have different roles and permissions** (Open) - Related: #191, #190, #151 6. 🔄 **As Admin I can configure settings globally** (Open) -7. 🔄 **Accounts & Logins** (Open) - Related: #171, #169, #168 +7. ✅ **Accounts & Logins** (Partially closed) - #171 implemented (OIDC linking), #169/#168 still open 8. 🔄 **I can add custom fields** (Open) - Related: #194, #157, #161 9. 🔄 **Import transactions via vereinfacht API** (Open) - Related: #156 10. 🔄 **We have a staging environment** (Open)