From 828e09dce14480831d0ba1ab1cd826d72489e1b9 Mon Sep 17 00:00:00 2001 From: carla Date: Tue, 23 Dec 2025 18:12:05 +0100 Subject: [PATCH] docs: adds implementation plan --- docs/csv-member-import-v1.md | 611 +++++++++++++++++++++++++++++++++++ 1 file changed, 611 insertions(+) create mode 100644 docs/csv-member-import-v1.md diff --git a/docs/csv-member-import-v1.md b/docs/csv-member-import-v1.md new file mode 100644 index 0000000..30409b8 --- /dev/null +++ b/docs/csv-member-import-v1.md @@ -0,0 +1,611 @@ +# CSV Member Import v1 - Implementation Plan + +**Version:** 1.0 +**Date:** 2025-01-XX +**Status:** Ready for Implementation +**Related Documents:** +- [Feature Roadmap](./feature-roadmap.md) - Overall feature planning + +--- + +## Table of Contents + +- [Overview & Scope](#overview--scope) +- [UX Flow](#ux-flow) +- [CSV Specification](#csv-specification) +- [Technical Design Notes](#technical-design-notes) +- [Implementation Issues](#implementation-issues) +- [Rollout & Risks](#rollout--risks) + +--- + +## Overview & Scope + +### What We're Building + +A **basic CSV member import feature** that allows administrators to upload a CSV file and import new members into the system. This is a **v1 minimal implementation** focused on establishing the import structure without advanced features. + +**Core Functionality (v1 Minimal):** +- Upload CSV file via LiveView file upload +- Parse CSV with bilingual header support for core member fields (English/German) +- Auto-detect delimiter (`;` or `,`) using header recognition +- Map CSV columns to core member fields (`first_name`, `last_name`, `email`, `phone_number`, `street`, `postal_code`, `city`) +- Validate each row (required fields: `first_name`, `last_name`, `email`) +- Create members via Ash resource (one-by-one, **no background jobs**, processed in chunks of 200 rows via LiveView messages) +- Display import results: success count, error count, and error details +- Provide static CSV templates (EN/DE) + +**Optional Enhancement (v1.1 - Last Issue):** +- Custom field import (if time permits, otherwise defer to v2) + +**Key Constraints (v1):** +- ✅ **Admin-only feature** +- ✅ **No upsert** (create only) +- ✅ **No deduplication** (duplicate emails fail and show as errors) +- ✅ **No mapping wizard** (fixed header mapping via bilingual variants) +- ✅ **No background jobs** (progress via LiveView `handle_info`) +- ✅ **Best-effort import** (row-by-row, no rollback) +- ✅ **UI-only error display** (no error CSV export) +- ✅ **Safety limits** (10 MB, 1,000 rows, chunks of 200) + +### Out of Scope (v1) + +**Deferred to Future Versions:** +- ❌ Upsert/update existing members +- ❌ Advanced deduplication strategies +- ❌ Column mapping wizard UI +- ❌ Background job processing (Oban/GenStage) +- ❌ Transactional all-or-nothing import +- ❌ Error CSV export/download +- ⚠️ Custom field import (optional, last issue - defer to v2 if scope is tight) +- ❌ Batch validation preview before import +- ❌ Date/boolean field parsing +- ❌ Dynamic template generation +- ❌ Import history/audit log +- ❌ Import templates for other entities + +--- + +## UX Flow + +### Access & Location + +**Entry Point:** +- **Location:** Global Settings page (`/settings`) +- **UI Element:** New section "Import Members (CSV)" below "Custom Fields" section +- **Access Control:** Admin-only (enforced at LiveView event level, not entire `/settings` route) + +### User Journey + +1. **Navigate to Global Settings** +2. **Access Import Section** + - Upload area (drag & drop or file picker) + - Template download links (English / German) + - Help text explaining CSV format +3. **Download Template (Optional)** +4. **Prepare CSV File** +5. **Upload CSV** +6. **Start Import** + - Runs server-side via LiveView messages (may take up to ~30 seconds for large files) +7. **View Results** + - Success count + - Error count + - First 50 errors, each with: + - **CSV line number** (header is line 1, first data record begins at line 2) + - Error message + - Field name (if applicable) + +### Error Handling + +- **File too large:** Flash error before upload starts +- **Too many rows:** Flash error before import starts +- **Invalid CSV format:** Error shown in results +- **Partial success:** Results show both success and error counts + +--- + +## CSV Specification + +### Delimiter + +**Recommended:** Semicolon (`;`) +**Supported:** `;` and `,` + +**Auto-Detection (Header Recognition):** +- Remove UTF-8 BOM *first* +- Extract header record and try parsing with both delimiters +- For each delimiter, count how many recognized headers are present (via normalized variants) +- Choose delimiter with higher recognition; prefer `;` if tied +- If neither yields recognized headers, default to `;` + +### Quoting Rules + +- Fields may be quoted with double quotes (`"`) +- Escaped quotes: `""` inside quoted field represents a single `"` +- **v1 assumption:** CSV records do **not** contain embedded newlines inside quoted fields. (If they do, parsing may fail or line numbers may be inaccurate.) + +### Column Headers + +**v1 Supported Fields (Core Member Fields Only):** +- `first_name` / `Vorname` (required) +- `last_name` / `Nachname` (required) +- `email` / `E-Mail` (required) +- `phone_number` / `Telefon` (optional) +- `street` / `Straße` (optional) +- `postal_code` / `PLZ` / `Postleitzahl` (optional) +- `city` / `Stadt` (optional) + +**Member Field Header Mapping:** + +| Canonical Field | English Variants | German Variants | +|---|---|---| +| `first_name` | `first_name`, `firstname` | `Vorname`, `vorname` | +| `last_name` | `last_name`, `lastname`, `surname` | `Nachname`, `nachname`, `Familienname` | +| `email` | `email`, `e-mail`, `e_mail` | `E-Mail`, `e-mail`, `e_mail` | +| `phone_number` | `phone_number`, `phone`, `telephone` | `Telefon`, `telefon` | +| `street` | `street`, `address` | `Straße`, `strasse`, `Strasse` | +| `postal_code` | `postal_code`, `zip`, `postcode` | `PLZ`, `plz`, `Postleitzahl`, `postleitzahl` | +| `city` | `city`, `town` | `Stadt`, `stadt`, `Ort` | + +**Header Normalization (used consistently for both input headers AND mapping variants):** +- Trim whitespace +- Convert to lowercase +- Normalize Unicode: `ß` → `ss` (e.g., `Straße` → `strasse`) +- Replace hyphens/whitespace with underscores: `E-Mail` → `e_mail`, `phone number` → `phone_number` +- Collapse multiple underscores: `e__mail` → `e_mail` +- Case-insensitive matching + +**Unknown columns:** ignored (no error) + +**Required fields:** `first_name`, `last_name`, `email` + +### CSV Template Files + +**Location:** +- `priv/static/templates/member_import_en.csv` +- `priv/static/templates/member_import_de.csv` + +**Content:** +- Header row with required + common optional fields +- One example row +- Uses semicolon delimiter (`;`) +- UTF-8 encoding **with BOM** (Excel compatibility) + +**Template Access:** +- Templates are static files in `priv/static/templates/` +- Served at: + - `/templates/member_import_en.csv` + - `/templates/member_import_de.csv` +- In LiveView, link them using Phoenix static path helpers (e.g. `~p` or `Routes.static_path/2`, depending on Phoenix version). + +### File Limits + +- **Max file size:** 10 MB +- **Max rows:** 1,000 rows (excluding header) +- **Processing:** chunks of 200 (via LiveView messages) +- **Encoding:** UTF-8 (BOM handled) + +--- + +## Technical Design Notes + +### Architecture Overview + +``` +┌─────────────────┐ +│ LiveView UI │ (GlobalSettingsLive or component) +│ - Upload area │ +│ - Progress │ +│ - Results │ +└────────┬────────┘ + │ prepare + ▼ +┌─────────────────────────────┐ +│ Import Service │ (Mv.Membership.Import.MemberCSV) +│ - parse + map + limit checks│ -> returns import_state +│ - process_chunk(chunk) │ -> returns chunk results +└────────┬────────────────────┘ + │ create + ▼ +┌─────────────────┐ +│ Ash Resource │ (Mv.Membership.Member) +│ - Create │ +└─────────────────┘ +``` + +### Technology Stack + +- **Phoenix LiveView:** file upload via `allow_upload/3` +- **NimbleCSV:** CSV parsing (add explicit dependency if missing) +- **Ash Resource:** member creation via `Membership.create_member/1` +- **Gettext:** bilingual UI/error messages + +### Module Structure + +**New Modules:** +- `lib/mv/membership/import/member_csv.ex` - import orchestration + chunk processing +- `lib/mv/membership/import/csv_parser.ex` - delimiter detection + parsing + BOM handling +- `lib/mv/membership/import/header_mapper.ex` - normalization + header mapping + +**Modified Modules:** +- `lib/mv_web/live/global_settings_live.ex` - render import section, handle upload/events/messages + +### Data Flow + +1. **Upload:** LiveView receives file via `allow_upload` +2. **Consume:** `consume_uploaded_entries/3` reads file content +3. **Prepare:** `MemberCSV.prepare/2` + - Strip BOM + - Detect delimiter (header recognition) + - Parse header + rows + - Map headers to canonical fields + - Early abort if required headers missing + - Row count check + - Return `import_state` containing chunks and metadata +4. **Process:** LiveView drives chunk processing via `handle_info` + - For each chunk: validate + create + collect errors +5. **Results:** LiveView shows progress + final summary + +### Types & Key Consistency + +- **Raw CSV parsing:** returns headers as list of strings, and rows **with csv line numbers** +- **Header mapping:** operates on normalized strings; mapping table variants are normalized once +- **Ash attrs:** built as atom-keyed map (`%{first_name: ..., ...}`) + +### Error Model + +```elixir +%{ + csv_line_number: 5, # physical line number in the CSV file + field: :email, # optional + message: "is not a valid email" +} +``` + +### CSV Line Numbers (Important) + +To keep error reporting user-friendly and accurate, **row errors must reference the physical line number in the original file**, even if empty lines are skipped. + +**Design decision:** the parser returns rows as: + +```elixir +rows :: [{csv_line_number :: pos_integer(), row_map :: map()}] +``` + +Downstream logic must **not** recompute line numbers from row indexes. + +### Authorization + +**Enforcement points:** +1. **LiveView event level:** check admin permission in `handle_event("start_import", ...)` +2. **UI level:** render import section only for admin users +3. **Static templates:** public assets (no authorization needed) + +Use `Mv.Authorization.PermissionSets` (preferred) instead of hard-coded string checks where possible. + +### Safety Limits + +- File size enforced by `allow_upload` (`max_file_size`) +- Row count enforced in `MemberCSV.prepare/2` before processing starts +- Chunking is done via **LiveView `handle_info` loop** (sequential, cooperative scheduling) + +--- + +## Implementation Issues + +### Issue #1: CSV Specification & Static Template Files + +**Dependencies:** None + +**Goal:** Define CSV contract and add static templates. + +**Tasks:** +- [ ] Finalize header mapping variants +- [ ] Document normalization rules +- [ ] Document delimiter detection strategy +- [ ] Create templates in `priv/static/templates/` (UTF-8 with BOM) +- [ ] Document template URLs and how to link them from LiveView +- [ ] Document line number semantics (physical CSV line numbers) + +**Definition of Done:** +- [ ] Templates open cleanly in Excel/LibreOffice +- [ ] CSV spec section complete + +--- + +### Issue #2: Import Service Module Skeleton + +**Dependencies:** None + +**Goal:** Create service API and error types. + +**API (recommended):** +- `prepare/2` — parse + map + limit checks, returns import_state +- `process_chunk/3` — process one chunk (pure-ish), returns per-chunk results + +**Tasks:** +- [ ] Create `lib/mv/membership/import/member_csv.ex` +- [ ] Define public function: `prepare/2 (file_content, opts \\ [])` +- [ ] Define public function: `process_chunk/3 (chunk_rows_with_lines, column_map, opts \\ [])` +- [ ] Define error struct: `%MemberCSV.Error{csv_line_number: integer, field: atom | nil, message: String.t}` +- [ ] Document module + API + +--- + +### Issue #3: CSV Parsing + Delimiter Auto-Detection + BOM Handling + +**Dependencies:** Issue #2 + +**Goal:** Parse CSV robustly with correct delimiter detection and BOM handling. + +**Tasks:** +- [ ] Verify/add NimbleCSV dependency (`{:nimble_csv, "~> 1.0"}`) +- [ ] Create `lib/mv/membership/import/csv_parser.ex` +- [ ] Implement `strip_bom/1` and apply it **before** any header handling +- [ ] Handle `\r\n` and `\n` line endings (trim `\r` on header record) +- [ ] Detect delimiter via header recognition (try `;` and `,`) +- [ ] Parse CSV and return: + - `headers :: [String.t()]` + - `rows :: [{csv_line_number, [String.t()]}]` or directly `[{csv_line_number, row_map}]` +- [ ] Skip completely empty records (but preserve correct physical line numbers) +- [ ] Return `{:ok, headers, rows}` or `{:error, reason}` + +**Definition of Done:** +- [ ] BOM handling works (Excel exports) +- [ ] Delimiter detection works reliably +- [ ] Rows carry correct `csv_line_number` + +--- + +### Issue #4: Header Normalization + Per-Header Mapping (No Language Detection) + +**Dependencies:** Issue #3 + +**Goal:** Map each header individually to canonical fields (normalized comparison). + +**Tasks:** +- [ ] Create `lib/mv/membership/import/header_mapper.ex` +- [ ] Implement `normalize_header/1` +- [ ] Normalize mapping variants once and compare normalized strings +- [ ] Build `column_map` (canonical field -> column index) +- [ ] **Early abort if required headers missing** (`first_name`, `last_name`, `email`) +- [ ] Ignore unknown columns + +**Definition of Done:** +- [ ] English/German headers map correctly +- [ ] Missing required columns fails fast + +--- + +### Issue #5: Validation (Required Fields) + Error Formatting + +**Dependencies:** Issue #4 + +**Goal:** Validate each row and return structured, translatable errors. + +**Tasks:** +- [ ] Implement `validate_row/3 (row_map, csv_line_number, opts)` +- [ ] Required field presence (`first_name`, `last_name`, `email`) +- [ ] Email format validation (EctoCommons.EmailValidator) +- [ ] Trim values before validation +- [ ] Gettext-backed error messages + +--- + +### Issue #6: Persistence via Ash Create + Per-Row Error Capture (Chunked Processing) + +**Dependencies:** Issue #5 + +**Goal:** Create members and capture errors per row with correct CSV line numbers. + +**Tasks:** +- [ ] Implement `process_chunk/3` in service: + - Input: `[{csv_line_number, row_map}]` + - Validate + create sequentially + - Collect counts + first 50 errors (per import overall; LiveView enforces cap across chunks) +- [ ] Implement Ash error formatter helper: + - Convert `Ash.Error.Invalid` into `%MemberCSV.Error{}` + - Prefer field-level errors where possible (attach `field` atom) + - Handle unique email constraint error as user-friendly message +- [ ] Map row_map to Ash attrs (`%{first_name: ..., ...}`) + +**Important:** **Do not recompute line numbers** in this layer—use the ones provided by the parser. + +--- + +### Issue #7: Admin Global Settings LiveView UI (Upload + Start Import + Results + Template Links) + +**Dependencies:** Issue #6 + +**Goal:** UI section with upload, progress, results, and template links. + +**Tasks:** +- [ ] Render import section only for admins +- [ ] Configure `allow_upload/3`: + - `.csv` only, `max_entries: 1`, `max_file_size: 10MB`, `auto_upload: false` +- [ ] `handle_event("start_import", ...)`: + - Admin permission check + - Consume upload -> read file content + - Call `MemberCSV.prepare/2` + - Store `import_state` in assigns (chunks + column_map + metadata) + - Initialize progress assigns + - `send(self(), {:process_chunk, 0})` +- [ ] `handle_info({:process_chunk, idx}, socket)`: + - Fetch chunk from `import_state` + - Call `MemberCSV.process_chunk/3` + - Merge counts/errors into progress assigns (cap errors at 50 overall) + - Schedule next chunk (or finish and show results) +- [ ] Results UI: + - Success count + - Failure count + - Error list (line number + message + field) + +**Template links:** +- Link `/templates/member_import_en.csv` and `/templates/member_import_de.csv` via Phoenix static path helpers. + +--- + +### Issue #8: Authorization + Limits + +**Dependencies:** None (can be parallelized) + +**Goal:** Ensure admin-only access and enforce limits. + +**Tasks:** +- [ ] Admin check in start import event handler +- [ ] File size enforced in upload config +- [ ] Row limit enforced in `MemberCSV.prepare/2` (max_rows from config) +- [ ] Configuration: + ```elixir + config :mv, csv_import: [ + max_file_size_mb: 10, + max_rows: 1000 + ] + ``` + +--- + +### Issue #9: End-to-End LiveView Tests + Fixtures + +**Dependencies:** Issue #7 and #8 + +**Tasks:** +- [ ] Fixtures: + - valid EN/DE + - invalid + - too many rows (1,001) + - BOM + `;` delimiter fixture + - fixture with empty line(s) to validate correct line numbers +- [ ] LiveView tests: + - admin sees section, non-admin does not + - upload + start import + - success + error rendering + - row limit + file size errors + +--- + +### Issue #10: Documentation Polish (Inline Help Text + Docs) + +**Dependencies:** Issue #9 + +**Tasks:** +- [ ] UI help text + translations +- [ ] CHANGELOG entry +- [ ] Ensure moduledocs/docs + +--- + +### Issue #11: Custom Field Import (Optional - v1.1) + +**Dependencies:** Issue #10 +**Status:** Optional + +*(unchanged — intentionally deferred)* + +--- + +## Rollout & Risks + +### Rollout Strategy +- Dev → Staging → Production (with anonymized real-world CSV tests) + +### Risks & Mitigations + +| Risk | Impact | Likelihood | Mitigation | +|---|---:|---:|---| +| Large import timeout | High | Medium | 10 MB + 1,000 rows, chunking via `handle_info` | +| Encoding issues | Medium | Medium | BOM stripping, templates with BOM | +| Invalid CSV format | Medium | High | Clear errors + templates | +| Duplicate emails | Low | High | Ash constraint error -> user-friendly message | +| Performance (no background jobs) | Medium | Low | Small limits, sequential chunk processing | +| Admin access bypass | High | Low | Event-level auth + UI hiding | +| Data corruption | High | Low | Per-row validation + best-effort | + +--- + +## Appendix + +### Module File Structure + +``` +lib/ +├── mv/ +│ └── membership/ +│ └── import/ +│ ├── member_csv.ex # prepare + process_chunk +│ ├── csv_parser.ex # delimiter detection + parsing + BOM handling +│ └── header_mapper.ex # normalization + header mapping +└── mv_web/ + └── live/ + └── global_settings_live.ex # add import section + LV message loop + +priv/ +└── static/ + └── templates/ + ├── member_import_en.csv + └── member_import_de.csv + +test/ +├── mv/ +│ └── membership/ +│ └── import/ +│ ├── member_csv_test.exs +│ ├── csv_parser_test.exs +│ └── header_mapper_test.exs +└── fixtures/ + ├── member_import_en.csv + ├── member_import_de.csv + ├── member_import_invalid.csv + ├── member_import_large.csv + └── member_import_empty_lines.csv +``` + +### Example Usage (LiveView) + +```elixir +def handle_event("start_import", _params, socket) do + assert_admin!(socket.assigns.current_user) + + [{_name, content}] = + consume_uploaded_entries(socket, :csv_file, fn %{path: path}, _entry -> + {:ok, File.read!(path)} + end) + + case Mv.Membership.Import.MemberCSV.prepare(content) do + {:ok, import_state} -> + socket = + socket + |> assign(:import_state, import_state) + |> assign(:import_progress, %{processed: 0, inserted: 0, failed: 0, errors: []}) + |> assign(:importing?, true) + + send(self(), {:process_chunk, 0}) + {:noreply, socket} + + {:error, reason} -> + {:noreply, put_flash(socket, :error, reason)} + end +end + +def handle_info({:process_chunk, idx}, socket) do + %{chunks: chunks, column_map: column_map} = socket.assigns.import_state + + case Enum.at(chunks, idx) do + nil -> + {:noreply, assign(socket, importing?: false)} + + chunk_rows_with_lines -> + {:ok, chunk_result} = + Mv.Membership.Import.MemberCSV.process_chunk(chunk_rows_with_lines, column_map) + + socket = merge_progress(socket, chunk_result) # caps errors at 50 overall + + send(self(), {:process_chunk, idx + 1}) + {:noreply, socket} + end +end +``` + +--- + +**End of Implementation Plan** \ No newline at end of file