docs: adds implementation plan
All checks were successful
continuous-integration/drone/push Build is passing

This commit is contained in:
carla 2025-12-23 18:12:05 +01:00
parent b6bc182510
commit 828e09dce1

View file

@ -0,0 +1,611 @@
# CSV Member Import v1 - Implementation Plan
**Version:** 1.0
**Date:** 2025-01-XX
**Status:** Ready for Implementation
**Related Documents:**
- [Feature Roadmap](./feature-roadmap.md) - Overall feature planning
---
## Table of Contents
- [Overview & Scope](#overview--scope)
- [UX Flow](#ux-flow)
- [CSV Specification](#csv-specification)
- [Technical Design Notes](#technical-design-notes)
- [Implementation Issues](#implementation-issues)
- [Rollout & Risks](#rollout--risks)
---
## Overview & Scope
### What We're Building
A **basic CSV member import feature** that allows administrators to upload a CSV file and import new members into the system. This is a **v1 minimal implementation** focused on establishing the import structure without advanced features.
**Core Functionality (v1 Minimal):**
- Upload CSV file via LiveView file upload
- Parse CSV with bilingual header support for core member fields (English/German)
- Auto-detect delimiter (`;` or `,`) using header recognition
- Map CSV columns to core member fields (`first_name`, `last_name`, `email`, `phone_number`, `street`, `postal_code`, `city`)
- Validate each row (required fields: `first_name`, `last_name`, `email`)
- Create members via Ash resource (one-by-one, **no background jobs**, processed in chunks of 200 rows via LiveView messages)
- Display import results: success count, error count, and error details
- Provide static CSV templates (EN/DE)
**Optional Enhancement (v1.1 - Last Issue):**
- Custom field import (if time permits, otherwise defer to v2)
**Key Constraints (v1):**
- ✅ **Admin-only feature**
- ✅ **No upsert** (create only)
- ✅ **No deduplication** (duplicate emails fail and show as errors)
- ✅ **No mapping wizard** (fixed header mapping via bilingual variants)
- ✅ **No background jobs** (progress via LiveView `handle_info`)
- ✅ **Best-effort import** (row-by-row, no rollback)
- ✅ **UI-only error display** (no error CSV export)
- ✅ **Safety limits** (10 MB, 1,000 rows, chunks of 200)
### Out of Scope (v1)
**Deferred to Future Versions:**
- ❌ Upsert/update existing members
- ❌ Advanced deduplication strategies
- ❌ Column mapping wizard UI
- ❌ Background job processing (Oban/GenStage)
- ❌ Transactional all-or-nothing import
- ❌ Error CSV export/download
- ⚠️ Custom field import (optional, last issue - defer to v2 if scope is tight)
- ❌ Batch validation preview before import
- ❌ Date/boolean field parsing
- ❌ Dynamic template generation
- ❌ Import history/audit log
- ❌ Import templates for other entities
---
## UX Flow
### Access & Location
**Entry Point:**
- **Location:** Global Settings page (`/settings`)
- **UI Element:** New section "Import Members (CSV)" below "Custom Fields" section
- **Access Control:** Admin-only (enforced at LiveView event level, not entire `/settings` route)
### User Journey
1. **Navigate to Global Settings**
2. **Access Import Section**
- Upload area (drag & drop or file picker)
- Template download links (English / German)
- Help text explaining CSV format
3. **Download Template (Optional)**
4. **Prepare CSV File**
5. **Upload CSV**
6. **Start Import**
- Runs server-side via LiveView messages (may take up to ~30 seconds for large files)
7. **View Results**
- Success count
- Error count
- First 50 errors, each with:
- **CSV line number** (header is line 1, first data record begins at line 2)
- Error message
- Field name (if applicable)
### Error Handling
- **File too large:** Flash error before upload starts
- **Too many rows:** Flash error before import starts
- **Invalid CSV format:** Error shown in results
- **Partial success:** Results show both success and error counts
---
## CSV Specification
### Delimiter
**Recommended:** Semicolon (`;`)
**Supported:** `;` and `,`
**Auto-Detection (Header Recognition):**
- Remove UTF-8 BOM *first*
- Extract header record and try parsing with both delimiters
- For each delimiter, count how many recognized headers are present (via normalized variants)
- Choose delimiter with higher recognition; prefer `;` if tied
- If neither yields recognized headers, default to `;`
### Quoting Rules
- Fields may be quoted with double quotes (`"`)
- Escaped quotes: `""` inside quoted field represents a single `"`
- **v1 assumption:** CSV records do **not** contain embedded newlines inside quoted fields. (If they do, parsing may fail or line numbers may be inaccurate.)
### Column Headers
**v1 Supported Fields (Core Member Fields Only):**
- `first_name` / `Vorname` (required)
- `last_name` / `Nachname` (required)
- `email` / `E-Mail` (required)
- `phone_number` / `Telefon` (optional)
- `street` / `Straße` (optional)
- `postal_code` / `PLZ` / `Postleitzahl` (optional)
- `city` / `Stadt` (optional)
**Member Field Header Mapping:**
| Canonical Field | English Variants | German Variants |
|---|---|---|
| `first_name` | `first_name`, `firstname` | `Vorname`, `vorname` |
| `last_name` | `last_name`, `lastname`, `surname` | `Nachname`, `nachname`, `Familienname` |
| `email` | `email`, `e-mail`, `e_mail` | `E-Mail`, `e-mail`, `e_mail` |
| `phone_number` | `phone_number`, `phone`, `telephone` | `Telefon`, `telefon` |
| `street` | `street`, `address` | `Straße`, `strasse`, `Strasse` |
| `postal_code` | `postal_code`, `zip`, `postcode` | `PLZ`, `plz`, `Postleitzahl`, `postleitzahl` |
| `city` | `city`, `town` | `Stadt`, `stadt`, `Ort` |
**Header Normalization (used consistently for both input headers AND mapping variants):**
- Trim whitespace
- Convert to lowercase
- Normalize Unicode: `ß``ss` (e.g., `Straße``strasse`)
- Replace hyphens/whitespace with underscores: `E-Mail``e_mail`, `phone number``phone_number`
- Collapse multiple underscores: `e__mail``e_mail`
- Case-insensitive matching
**Unknown columns:** ignored (no error)
**Required fields:** `first_name`, `last_name`, `email`
### CSV Template Files
**Location:**
- `priv/static/templates/member_import_en.csv`
- `priv/static/templates/member_import_de.csv`
**Content:**
- Header row with required + common optional fields
- One example row
- Uses semicolon delimiter (`;`)
- UTF-8 encoding **with BOM** (Excel compatibility)
**Template Access:**
- Templates are static files in `priv/static/templates/`
- Served at:
- `/templates/member_import_en.csv`
- `/templates/member_import_de.csv`
- In LiveView, link them using Phoenix static path helpers (e.g. `~p` or `Routes.static_path/2`, depending on Phoenix version).
### File Limits
- **Max file size:** 10 MB
- **Max rows:** 1,000 rows (excluding header)
- **Processing:** chunks of 200 (via LiveView messages)
- **Encoding:** UTF-8 (BOM handled)
---
## Technical Design Notes
### Architecture Overview
```
┌─────────────────┐
│ LiveView UI │ (GlobalSettingsLive or component)
│ - Upload area │
│ - Progress │
│ - Results │
└────────┬────────┘
│ prepare
┌─────────────────────────────┐
│ Import Service │ (Mv.Membership.Import.MemberCSV)
│ - parse + map + limit checks│ -> returns import_state
│ - process_chunk(chunk) │ -> returns chunk results
└────────┬────────────────────┘
│ create
┌─────────────────┐
│ Ash Resource │ (Mv.Membership.Member)
│ - Create │
└─────────────────┘
```
### Technology Stack
- **Phoenix LiveView:** file upload via `allow_upload/3`
- **NimbleCSV:** CSV parsing (add explicit dependency if missing)
- **Ash Resource:** member creation via `Membership.create_member/1`
- **Gettext:** bilingual UI/error messages
### Module Structure
**New Modules:**
- `lib/mv/membership/import/member_csv.ex` - import orchestration + chunk processing
- `lib/mv/membership/import/csv_parser.ex` - delimiter detection + parsing + BOM handling
- `lib/mv/membership/import/header_mapper.ex` - normalization + header mapping
**Modified Modules:**
- `lib/mv_web/live/global_settings_live.ex` - render import section, handle upload/events/messages
### Data Flow
1. **Upload:** LiveView receives file via `allow_upload`
2. **Consume:** `consume_uploaded_entries/3` reads file content
3. **Prepare:** `MemberCSV.prepare/2`
- Strip BOM
- Detect delimiter (header recognition)
- Parse header + rows
- Map headers to canonical fields
- Early abort if required headers missing
- Row count check
- Return `import_state` containing chunks and metadata
4. **Process:** LiveView drives chunk processing via `handle_info`
- For each chunk: validate + create + collect errors
5. **Results:** LiveView shows progress + final summary
### Types & Key Consistency
- **Raw CSV parsing:** returns headers as list of strings, and rows **with csv line numbers**
- **Header mapping:** operates on normalized strings; mapping table variants are normalized once
- **Ash attrs:** built as atom-keyed map (`%{first_name: ..., ...}`)
### Error Model
```elixir
%{
csv_line_number: 5, # physical line number in the CSV file
field: :email, # optional
message: "is not a valid email"
}
```
### CSV Line Numbers (Important)
To keep error reporting user-friendly and accurate, **row errors must reference the physical line number in the original file**, even if empty lines are skipped.
**Design decision:** the parser returns rows as:
```elixir
rows :: [{csv_line_number :: pos_integer(), row_map :: map()}]
```
Downstream logic must **not** recompute line numbers from row indexes.
### Authorization
**Enforcement points:**
1. **LiveView event level:** check admin permission in `handle_event("start_import", ...)`
2. **UI level:** render import section only for admin users
3. **Static templates:** public assets (no authorization needed)
Use `Mv.Authorization.PermissionSets` (preferred) instead of hard-coded string checks where possible.
### Safety Limits
- File size enforced by `allow_upload` (`max_file_size`)
- Row count enforced in `MemberCSV.prepare/2` before processing starts
- Chunking is done via **LiveView `handle_info` loop** (sequential, cooperative scheduling)
---
## Implementation Issues
### Issue #1: CSV Specification & Static Template Files
**Dependencies:** None
**Goal:** Define CSV contract and add static templates.
**Tasks:**
- [ ] Finalize header mapping variants
- [ ] Document normalization rules
- [ ] Document delimiter detection strategy
- [ ] Create templates in `priv/static/templates/` (UTF-8 with BOM)
- [ ] Document template URLs and how to link them from LiveView
- [ ] Document line number semantics (physical CSV line numbers)
**Definition of Done:**
- [ ] Templates open cleanly in Excel/LibreOffice
- [ ] CSV spec section complete
---
### Issue #2: Import Service Module Skeleton
**Dependencies:** None
**Goal:** Create service API and error types.
**API (recommended):**
- `prepare/2` — parse + map + limit checks, returns import_state
- `process_chunk/3` — process one chunk (pure-ish), returns per-chunk results
**Tasks:**
- [ ] Create `lib/mv/membership/import/member_csv.ex`
- [ ] Define public function: `prepare/2 (file_content, opts \\ [])`
- [ ] Define public function: `process_chunk/3 (chunk_rows_with_lines, column_map, opts \\ [])`
- [ ] Define error struct: `%MemberCSV.Error{csv_line_number: integer, field: atom | nil, message: String.t}`
- [ ] Document module + API
---
### Issue #3: CSV Parsing + Delimiter Auto-Detection + BOM Handling
**Dependencies:** Issue #2
**Goal:** Parse CSV robustly with correct delimiter detection and BOM handling.
**Tasks:**
- [ ] Verify/add NimbleCSV dependency (`{:nimble_csv, "~> 1.0"}`)
- [ ] Create `lib/mv/membership/import/csv_parser.ex`
- [ ] Implement `strip_bom/1` and apply it **before** any header handling
- [ ] Handle `\r\n` and `\n` line endings (trim `\r` on header record)
- [ ] Detect delimiter via header recognition (try `;` and `,`)
- [ ] Parse CSV and return:
- `headers :: [String.t()]`
- `rows :: [{csv_line_number, [String.t()]}]` or directly `[{csv_line_number, row_map}]`
- [ ] Skip completely empty records (but preserve correct physical line numbers)
- [ ] Return `{:ok, headers, rows}` or `{:error, reason}`
**Definition of Done:**
- [ ] BOM handling works (Excel exports)
- [ ] Delimiter detection works reliably
- [ ] Rows carry correct `csv_line_number`
---
### Issue #4: Header Normalization + Per-Header Mapping (No Language Detection)
**Dependencies:** Issue #3
**Goal:** Map each header individually to canonical fields (normalized comparison).
**Tasks:**
- [ ] Create `lib/mv/membership/import/header_mapper.ex`
- [ ] Implement `normalize_header/1`
- [ ] Normalize mapping variants once and compare normalized strings
- [ ] Build `column_map` (canonical field -> column index)
- [ ] **Early abort if required headers missing** (`first_name`, `last_name`, `email`)
- [ ] Ignore unknown columns
**Definition of Done:**
- [ ] English/German headers map correctly
- [ ] Missing required columns fails fast
---
### Issue #5: Validation (Required Fields) + Error Formatting
**Dependencies:** Issue #4
**Goal:** Validate each row and return structured, translatable errors.
**Tasks:**
- [ ] Implement `validate_row/3 (row_map, csv_line_number, opts)`
- [ ] Required field presence (`first_name`, `last_name`, `email`)
- [ ] Email format validation (EctoCommons.EmailValidator)
- [ ] Trim values before validation
- [ ] Gettext-backed error messages
---
### Issue #6: Persistence via Ash Create + Per-Row Error Capture (Chunked Processing)
**Dependencies:** Issue #5
**Goal:** Create members and capture errors per row with correct CSV line numbers.
**Tasks:**
- [ ] Implement `process_chunk/3` in service:
- Input: `[{csv_line_number, row_map}]`
- Validate + create sequentially
- Collect counts + first 50 errors (per import overall; LiveView enforces cap across chunks)
- [ ] Implement Ash error formatter helper:
- Convert `Ash.Error.Invalid` into `%MemberCSV.Error{}`
- Prefer field-level errors where possible (attach `field` atom)
- Handle unique email constraint error as user-friendly message
- [ ] Map row_map to Ash attrs (`%{first_name: ..., ...}`)
**Important:** **Do not recompute line numbers** in this layer—use the ones provided by the parser.
---
### Issue #7: Admin Global Settings LiveView UI (Upload + Start Import + Results + Template Links)
**Dependencies:** Issue #6
**Goal:** UI section with upload, progress, results, and template links.
**Tasks:**
- [ ] Render import section only for admins
- [ ] Configure `allow_upload/3`:
- `.csv` only, `max_entries: 1`, `max_file_size: 10MB`, `auto_upload: false`
- [ ] `handle_event("start_import", ...)`:
- Admin permission check
- Consume upload -> read file content
- Call `MemberCSV.prepare/2`
- Store `import_state` in assigns (chunks + column_map + metadata)
- Initialize progress assigns
- `send(self(), {:process_chunk, 0})`
- [ ] `handle_info({:process_chunk, idx}, socket)`:
- Fetch chunk from `import_state`
- Call `MemberCSV.process_chunk/3`
- Merge counts/errors into progress assigns (cap errors at 50 overall)
- Schedule next chunk (or finish and show results)
- [ ] Results UI:
- Success count
- Failure count
- Error list (line number + message + field)
**Template links:**
- Link `/templates/member_import_en.csv` and `/templates/member_import_de.csv` via Phoenix static path helpers.
---
### Issue #8: Authorization + Limits
**Dependencies:** None (can be parallelized)
**Goal:** Ensure admin-only access and enforce limits.
**Tasks:**
- [ ] Admin check in start import event handler
- [ ] File size enforced in upload config
- [ ] Row limit enforced in `MemberCSV.prepare/2` (max_rows from config)
- [ ] Configuration:
```elixir
config :mv, csv_import: [
max_file_size_mb: 10,
max_rows: 1000
]
```
---
### Issue #9: End-to-End LiveView Tests + Fixtures
**Dependencies:** Issue #7 and #8
**Tasks:**
- [ ] Fixtures:
- valid EN/DE
- invalid
- too many rows (1,001)
- BOM + `;` delimiter fixture
- fixture with empty line(s) to validate correct line numbers
- [ ] LiveView tests:
- admin sees section, non-admin does not
- upload + start import
- success + error rendering
- row limit + file size errors
---
### Issue #10: Documentation Polish (Inline Help Text + Docs)
**Dependencies:** Issue #9
**Tasks:**
- [ ] UI help text + translations
- [ ] CHANGELOG entry
- [ ] Ensure moduledocs/docs
---
### Issue #11: Custom Field Import (Optional - v1.1)
**Dependencies:** Issue #10
**Status:** Optional
*(unchanged — intentionally deferred)*
---
## Rollout & Risks
### Rollout Strategy
- Dev → Staging → Production (with anonymized real-world CSV tests)
### Risks & Mitigations
| Risk | Impact | Likelihood | Mitigation |
|---|---:|---:|---|
| Large import timeout | High | Medium | 10 MB + 1,000 rows, chunking via `handle_info` |
| Encoding issues | Medium | Medium | BOM stripping, templates with BOM |
| Invalid CSV format | Medium | High | Clear errors + templates |
| Duplicate emails | Low | High | Ash constraint error -> user-friendly message |
| Performance (no background jobs) | Medium | Low | Small limits, sequential chunk processing |
| Admin access bypass | High | Low | Event-level auth + UI hiding |
| Data corruption | High | Low | Per-row validation + best-effort |
---
## Appendix
### Module File Structure
```
lib/
├── mv/
│ └── membership/
│ └── import/
│ ├── member_csv.ex # prepare + process_chunk
│ ├── csv_parser.ex # delimiter detection + parsing + BOM handling
│ └── header_mapper.ex # normalization + header mapping
└── mv_web/
└── live/
└── global_settings_live.ex # add import section + LV message loop
priv/
└── static/
└── templates/
├── member_import_en.csv
└── member_import_de.csv
test/
├── mv/
│ └── membership/
│ └── import/
│ ├── member_csv_test.exs
│ ├── csv_parser_test.exs
│ └── header_mapper_test.exs
└── fixtures/
├── member_import_en.csv
├── member_import_de.csv
├── member_import_invalid.csv
├── member_import_large.csv
└── member_import_empty_lines.csv
```
### Example Usage (LiveView)
```elixir
def handle_event("start_import", _params, socket) do
assert_admin!(socket.assigns.current_user)
[{_name, content}] =
consume_uploaded_entries(socket, :csv_file, fn %{path: path}, _entry ->
{:ok, File.read!(path)}
end)
case Mv.Membership.Import.MemberCSV.prepare(content) do
{:ok, import_state} ->
socket =
socket
|> assign(:import_state, import_state)
|> assign(:import_progress, %{processed: 0, inserted: 0, failed: 0, errors: []})
|> assign(:importing?, true)
send(self(), {:process_chunk, 0})
{:noreply, socket}
{:error, reason} ->
{:noreply, put_flash(socket, :error, reason)}
end
end
def handle_info({:process_chunk, idx}, socket) do
%{chunks: chunks, column_map: column_map} = socket.assigns.import_state
case Enum.at(chunks, idx) do
nil ->
{:noreply, assign(socket, importing?: false)}
chunk_rows_with_lines ->
{:ok, chunk_result} =
Mv.Membership.Import.MemberCSV.process_chunk(chunk_rows_with_lines, column_map)
socket = merge_progress(socket, chunk_result) # caps errors at 50 overall
send(self(), {:process_chunk, idx + 1})
{:noreply, socket}
end
end
```
---
**End of Implementation Plan**