# CSV Member Import v1 - Implementation Plan **Version:** 1.0 **Date:** 2025-01-XX **Status:** Ready for Implementation **Related Documents:** - [Feature Roadmap](./feature-roadmap.md) - Overall feature planning --- ## Table of Contents - [Overview & Scope](#overview--scope) - [UX Flow](#ux-flow) - [CSV Specification](#csv-specification) - [Technical Design Notes](#technical-design-notes) - [Implementation Issues](#implementation-issues) - [Rollout & Risks](#rollout--risks) --- ## Overview & Scope ### What We're Building A **basic CSV member import feature** that allows administrators to upload a CSV file and import new members into the system. This is a **v1 minimal implementation** focused on establishing the import structure without advanced features. **Core Functionality (v1 Minimal):** - Upload CSV file via LiveView file upload - Parse CSV with bilingual header support for core member fields (English/German) - Auto-detect delimiter (`;` or `,`) using header recognition - Map CSV columns to core member fields (`first_name`, `last_name`, `email`, `phone_number`, `street`, `postal_code`, `city`) - Validate each row (required fields: `first_name`, `last_name`, `email`) - Create members via Ash resource (one-by-one, **no background jobs**, processed in chunks of 200 rows via LiveView messages) - Display import results: success count, error count, and error details - Provide static CSV templates (EN/DE) **Optional Enhancement (v1.1 - Last Issue):** - Custom field import (if time permits, otherwise defer to v2) **Key Constraints (v1):** - ✅ **Admin-only feature** - ✅ **No upsert** (create only) - ✅ **No deduplication** (duplicate emails fail and show as errors) - ✅ **No mapping wizard** (fixed header mapping via bilingual variants) - ✅ **No background jobs** (progress via LiveView `handle_info`) - ✅ **Best-effort import** (row-by-row, no rollback) - ✅ **UI-only error display** (no error CSV export) - ✅ **Safety limits** (10 MB, 1,000 rows, chunks of 200) ### Out of Scope (v1) **Deferred to Future Versions:** - ❌ Upsert/update existing members - ❌ Advanced deduplication strategies - ❌ Column mapping wizard UI - ❌ Background job processing (Oban/GenStage) - ❌ Transactional all-or-nothing import - ❌ Error CSV export/download - ⚠️ Custom field import (optional, last issue - defer to v2 if scope is tight) - ❌ Batch validation preview before import - ❌ Date/boolean field parsing - ❌ Dynamic template generation - ❌ Import history/audit log - ❌ Import templates for other entities --- ## UX Flow ### Access & Location **Entry Point:** - **Location:** Global Settings page (`/settings`) - **UI Element:** New section "Import Members (CSV)" below "Custom Fields" section - **Access Control:** Admin-only (enforced at LiveView event level, not entire `/settings` route) ### User Journey 1. **Navigate to Global Settings** 2. **Access Import Section** - Upload area (drag & drop or file picker) - Template download links (English / German) - Help text explaining CSV format 3. **Download Template (Optional)** 4. **Prepare CSV File** 5. **Upload CSV** 6. **Start Import** - Runs server-side via LiveView messages (may take up to ~30 seconds for large files) 7. **View Results** - Success count - Error count - First 50 errors, each with: - **CSV line number** (header is line 1, first data record begins at line 2) - Error message - Field name (if applicable) ### Error Handling - **File too large:** Flash error before upload starts - **Too many rows:** Flash error before import starts - **Invalid CSV format:** Error shown in results - **Partial success:** Results show both success and error counts --- ## CSV Specification ### Delimiter **Recommended:** Semicolon (`;`) **Supported:** `;` and `,` **Auto-Detection (Header Recognition):** - Remove UTF-8 BOM *first* - Extract header record and try parsing with both delimiters - For each delimiter, count how many recognized headers are present (via normalized variants) - Choose delimiter with higher recognition; prefer `;` if tied - If neither yields recognized headers, default to `;` ### Quoting Rules - Fields may be quoted with double quotes (`"`) - Escaped quotes: `""` inside quoted field represents a single `"` - **v1 assumption:** CSV records do **not** contain embedded newlines inside quoted fields. (If they do, parsing may fail or line numbers may be inaccurate.) ### Column Headers **v1 Supported Fields (Core Member Fields Only):** - `first_name` / `Vorname` (required) - `last_name` / `Nachname` (required) - `email` / `E-Mail` (required) - `phone_number` / `Telefon` (optional) - `street` / `Straße` (optional) - `postal_code` / `PLZ` / `Postleitzahl` (optional) - `city` / `Stadt` (optional) **Member Field Header Mapping:** | Canonical Field | English Variants | German Variants | |---|---|---| | `first_name` | `first_name`, `firstname` | `Vorname`, `vorname` | | `last_name` | `last_name`, `lastname`, `surname` | `Nachname`, `nachname`, `Familienname` | | `email` | `email`, `e-mail`, `e_mail` | `E-Mail`, `e-mail`, `e_mail` | | `phone_number` | `phone_number`, `phone`, `telephone` | `Telefon`, `telefon` | | `street` | `street`, `address` | `Straße`, `strasse`, `Strasse` | | `postal_code` | `postal_code`, `zip`, `postcode` | `PLZ`, `plz`, `Postleitzahl`, `postleitzahl` | | `city` | `city`, `town` | `Stadt`, `stadt`, `Ort` | **Header Normalization (used consistently for both input headers AND mapping variants):** - Trim whitespace - Convert to lowercase - Normalize Unicode: `ß` → `ss` (e.g., `Straße` → `strasse`) - Replace hyphens/whitespace with underscores: `E-Mail` → `e_mail`, `phone number` → `phone_number` - Collapse multiple underscores: `e__mail` → `e_mail` - Case-insensitive matching **Unknown columns:** ignored (no error) **Required fields:** `first_name`, `last_name`, `email` ### CSV Template Files **Location:** - `priv/static/templates/member_import_en.csv` - `priv/static/templates/member_import_de.csv` **Content:** - Header row with required + common optional fields - One example row - Uses semicolon delimiter (`;`) - UTF-8 encoding **with BOM** (Excel compatibility) **Template Access:** - Templates are static files in `priv/static/templates/` - Served at: - `/templates/member_import_en.csv` - `/templates/member_import_de.csv` - In LiveView, link them using Phoenix static path helpers (e.g. `~p` or `Routes.static_path/2`, depending on Phoenix version). ### File Limits - **Max file size:** 10 MB - **Max rows:** 1,000 rows (excluding header) - **Processing:** chunks of 200 (via LiveView messages) - **Encoding:** UTF-8 (BOM handled) --- ## Technical Design Notes ### Architecture Overview ``` ┌─────────────────┐ │ LiveView UI │ (GlobalSettingsLive or component) │ - Upload area │ │ - Progress │ │ - Results │ └────────┬────────┘ │ prepare ▼ ┌─────────────────────────────┐ │ Import Service │ (Mv.Membership.Import.MemberCSV) │ - parse + map + limit checks│ -> returns import_state │ - process_chunk(chunk) │ -> returns chunk results └────────┬────────────────────┘ │ create ▼ ┌─────────────────┐ │ Ash Resource │ (Mv.Membership.Member) │ - Create │ └─────────────────┘ ``` ### Technology Stack - **Phoenix LiveView:** file upload via `allow_upload/3` - **NimbleCSV:** CSV parsing (add explicit dependency if missing) - **Ash Resource:** member creation via `Membership.create_member/1` - **Gettext:** bilingual UI/error messages ### Module Structure **New Modules:** - `lib/mv/membership/import/member_csv.ex` - import orchestration + chunk processing - `lib/mv/membership/import/csv_parser.ex` - delimiter detection + parsing + BOM handling - `lib/mv/membership/import/header_mapper.ex` - normalization + header mapping **Modified Modules:** - `lib/mv_web/live/global_settings_live.ex` - render import section, handle upload/events/messages ### Data Flow 1. **Upload:** LiveView receives file via `allow_upload` 2. **Consume:** `consume_uploaded_entries/3` reads file content 3. **Prepare:** `MemberCSV.prepare/2` - Strip BOM - Detect delimiter (header recognition) - Parse header + rows - Map headers to canonical fields - Early abort if required headers missing - Row count check - Return `import_state` containing chunks and metadata 4. **Process:** LiveView drives chunk processing via `handle_info` - For each chunk: validate + create + collect errors 5. **Results:** LiveView shows progress + final summary ### Types & Key Consistency - **Raw CSV parsing:** returns headers as list of strings, and rows **with csv line numbers** - **Header mapping:** operates on normalized strings; mapping table variants are normalized once - **Ash attrs:** built as atom-keyed map (`%{first_name: ..., ...}`) ### Error Model ```elixir %{ csv_line_number: 5, # physical line number in the CSV file field: :email, # optional message: "is not a valid email" } ``` ### CSV Line Numbers (Important) To keep error reporting user-friendly and accurate, **row errors must reference the physical line number in the original file**, even if empty lines are skipped. **Design decision:** the parser returns rows as: ```elixir rows :: [{csv_line_number :: pos_integer(), row_map :: map()}] ``` Downstream logic must **not** recompute line numbers from row indexes. ### Authorization **Enforcement points:** 1. **LiveView event level:** check admin permission in `handle_event("start_import", ...)` 2. **UI level:** render import section only for admin users 3. **Static templates:** public assets (no authorization needed) Use `Mv.Authorization.PermissionSets` (preferred) instead of hard-coded string checks where possible. ### Safety Limits - File size enforced by `allow_upload` (`max_file_size`) - Row count enforced in `MemberCSV.prepare/2` before processing starts - Chunking is done via **LiveView `handle_info` loop** (sequential, cooperative scheduling) --- ## Implementation Issues ### Issue #1: CSV Specification & Static Template Files **Dependencies:** None **Goal:** Define CSV contract and add static templates. **Tasks:** - [ ] Finalize header mapping variants - [ ] Document normalization rules - [ ] Document delimiter detection strategy - [ ] Create templates in `priv/static/templates/` (UTF-8 with BOM) - [ ] Document template URLs and how to link them from LiveView - [ ] Document line number semantics (physical CSV line numbers) **Definition of Done:** - [ ] Templates open cleanly in Excel/LibreOffice - [ ] CSV spec section complete --- ### Issue #2: Import Service Module Skeleton **Dependencies:** None **Goal:** Create service API and error types. **API (recommended):** - `prepare/2` — parse + map + limit checks, returns import_state - `process_chunk/3` — process one chunk (pure-ish), returns per-chunk results **Tasks:** - [ ] Create `lib/mv/membership/import/member_csv.ex` - [ ] Define public function: `prepare/2 (file_content, opts \\ [])` - [ ] Define public function: `process_chunk/3 (chunk_rows_with_lines, column_map, opts \\ [])` - [ ] Define error struct: `%MemberCSV.Error{csv_line_number: integer, field: atom | nil, message: String.t}` - [ ] Document module + API --- ### Issue #3: CSV Parsing + Delimiter Auto-Detection + BOM Handling **Dependencies:** Issue #2 **Goal:** Parse CSV robustly with correct delimiter detection and BOM handling. **Tasks:** - [ ] Verify/add NimbleCSV dependency (`{:nimble_csv, "~> 1.0"}`) - [ ] Create `lib/mv/membership/import/csv_parser.ex` - [ ] Implement `strip_bom/1` and apply it **before** any header handling - [ ] Handle `\r\n` and `\n` line endings (trim `\r` on header record) - [ ] Detect delimiter via header recognition (try `;` and `,`) - [ ] Parse CSV and return: - `headers :: [String.t()]` - `rows :: [{csv_line_number, [String.t()]}]` or directly `[{csv_line_number, row_map}]` - [ ] Skip completely empty records (but preserve correct physical line numbers) - [ ] Return `{:ok, headers, rows}` or `{:error, reason}` **Definition of Done:** - [ ] BOM handling works (Excel exports) - [ ] Delimiter detection works reliably - [ ] Rows carry correct `csv_line_number` --- ### Issue #4: Header Normalization + Per-Header Mapping (No Language Detection) **Dependencies:** Issue #3 **Goal:** Map each header individually to canonical fields (normalized comparison). **Tasks:** - [ ] Create `lib/mv/membership/import/header_mapper.ex` - [ ] Implement `normalize_header/1` - [ ] Normalize mapping variants once and compare normalized strings - [ ] Build `column_map` (canonical field -> column index) - [ ] **Early abort if required headers missing** (`first_name`, `last_name`, `email`) - [ ] Ignore unknown columns **Definition of Done:** - [ ] English/German headers map correctly - [ ] Missing required columns fails fast --- ### Issue #5: Validation (Required Fields) + Error Formatting **Dependencies:** Issue #4 **Goal:** Validate each row and return structured, translatable errors. **Tasks:** - [ ] Implement `validate_row/3 (row_map, csv_line_number, opts)` - [ ] Required field presence (`first_name`, `last_name`, `email`) - [ ] Email format validation (EctoCommons.EmailValidator) - [ ] Trim values before validation - [ ] Gettext-backed error messages --- ### Issue #6: Persistence via Ash Create + Per-Row Error Capture (Chunked Processing) **Dependencies:** Issue #5 **Goal:** Create members and capture errors per row with correct CSV line numbers. **Tasks:** - [ ] Implement `process_chunk/3` in service: - Input: `[{csv_line_number, row_map}]` - Validate + create sequentially - Collect counts + first 50 errors (per import overall; LiveView enforces cap across chunks) - [ ] Implement Ash error formatter helper: - Convert `Ash.Error.Invalid` into `%MemberCSV.Error{}` - Prefer field-level errors where possible (attach `field` atom) - Handle unique email constraint error as user-friendly message - [ ] Map row_map to Ash attrs (`%{first_name: ..., ...}`) **Important:** **Do not recompute line numbers** in this layer—use the ones provided by the parser. --- ### Issue #7: Admin Global Settings LiveView UI (Upload + Start Import + Results + Template Links) **Dependencies:** Issue #6 **Goal:** UI section with upload, progress, results, and template links. **Tasks:** - [ ] Render import section only for admins - [ ] Configure `allow_upload/3`: - `.csv` only, `max_entries: 1`, `max_file_size: 10MB`, `auto_upload: false` - [ ] `handle_event("start_import", ...)`: - Admin permission check - Consume upload -> read file content - Call `MemberCSV.prepare/2` - Store `import_state` in assigns (chunks + column_map + metadata) - Initialize progress assigns - `send(self(), {:process_chunk, 0})` - [ ] `handle_info({:process_chunk, idx}, socket)`: - Fetch chunk from `import_state` - Call `MemberCSV.process_chunk/3` - Merge counts/errors into progress assigns (cap errors at 50 overall) - Schedule next chunk (or finish and show results) - [ ] Results UI: - Success count - Failure count - Error list (line number + message + field) **Template links:** - Link `/templates/member_import_en.csv` and `/templates/member_import_de.csv` via Phoenix static path helpers. --- ### Issue #8: Authorization + Limits **Dependencies:** None (can be parallelized) **Goal:** Ensure admin-only access and enforce limits. **Tasks:** - [ ] Admin check in start import event handler - [ ] File size enforced in upload config - [ ] Row limit enforced in `MemberCSV.prepare/2` (max_rows from config) - [ ] Configuration: ```elixir config :mv, csv_import: [ max_file_size_mb: 10, max_rows: 1000 ] ``` --- ### Issue #9: End-to-End LiveView Tests + Fixtures **Dependencies:** Issue #7 and #8 **Tasks:** - [ ] Fixtures: - valid EN/DE - invalid - too many rows (1,001) - BOM + `;` delimiter fixture - fixture with empty line(s) to validate correct line numbers - [ ] LiveView tests: - admin sees section, non-admin does not - upload + start import - success + error rendering - row limit + file size errors --- ### Issue #10: Documentation Polish (Inline Help Text + Docs) **Dependencies:** Issue #9 **Tasks:** - [ ] UI help text + translations - [ ] CHANGELOG entry - [ ] Ensure moduledocs/docs --- ### Issue #11: Custom Field Import (Optional - v1.1) **Dependencies:** Issue #10 **Status:** Optional *(unchanged — intentionally deferred)* --- ## Rollout & Risks ### Rollout Strategy - Dev → Staging → Production (with anonymized real-world CSV tests) ### Risks & Mitigations | Risk | Impact | Likelihood | Mitigation | |---|---:|---:|---| | Large import timeout | High | Medium | 10 MB + 1,000 rows, chunking via `handle_info` | | Encoding issues | Medium | Medium | BOM stripping, templates with BOM | | Invalid CSV format | Medium | High | Clear errors + templates | | Duplicate emails | Low | High | Ash constraint error -> user-friendly message | | Performance (no background jobs) | Medium | Low | Small limits, sequential chunk processing | | Admin access bypass | High | Low | Event-level auth + UI hiding | | Data corruption | High | Low | Per-row validation + best-effort | --- ## Appendix ### Module File Structure ``` lib/ ├── mv/ │ └── membership/ │ └── import/ │ ├── member_csv.ex # prepare + process_chunk │ ├── csv_parser.ex # delimiter detection + parsing + BOM handling │ └── header_mapper.ex # normalization + header mapping └── mv_web/ └── live/ └── global_settings_live.ex # add import section + LV message loop priv/ └── static/ └── templates/ ├── member_import_en.csv └── member_import_de.csv test/ ├── mv/ │ └── membership/ │ └── import/ │ ├── member_csv_test.exs │ ├── csv_parser_test.exs │ └── header_mapper_test.exs └── fixtures/ ├── member_import_en.csv ├── member_import_de.csv ├── member_import_invalid.csv ├── member_import_large.csv └── member_import_empty_lines.csv ``` ### Example Usage (LiveView) ```elixir def handle_event("start_import", _params, socket) do assert_admin!(socket.assigns.current_user) [{_name, content}] = consume_uploaded_entries(socket, :csv_file, fn %{path: path}, _entry -> {:ok, File.read!(path)} end) case Mv.Membership.Import.MemberCSV.prepare(content) do {:ok, import_state} -> socket = socket |> assign(:import_state, import_state) |> assign(:import_progress, %{processed: 0, inserted: 0, failed: 0, errors: []}) |> assign(:importing?, true) send(self(), {:process_chunk, 0}) {:noreply, socket} {:error, reason} -> {:noreply, put_flash(socket, :error, reason)} end end def handle_info({:process_chunk, idx}, socket) do %{chunks: chunks, column_map: column_map} = socket.assigns.import_state case Enum.at(chunks, idx) do nil -> {:noreply, assign(socket, importing?: false)} chunk_rows_with_lines -> {:ok, chunk_result} = Mv.Membership.Import.MemberCSV.process_chunk(chunk_rows_with_lines, column_map) socket = merge_progress(socket, chunk_result) # caps errors at 50 overall send(self(), {:process_chunk, idx + 1}) {:noreply, socket} end end ``` --- **End of Implementation Plan**