mitgliederverwaltung/docs/csv-member-import-v1.md
carla 828e09dce1
All checks were successful
continuous-integration/drone/push Build is passing
docs: adds implementation plan
2025-12-23 18:12:05 +01:00

20 KiB

CSV Member Import v1 - Implementation Plan

Version: 1.0
Date: 2025-01-XX
Status: Ready for Implementation
Related Documents:


Table of Contents


Overview & Scope

What We're Building

A basic CSV member import feature that allows administrators to upload a CSV file and import new members into the system. This is a v1 minimal implementation focused on establishing the import structure without advanced features.

Core Functionality (v1 Minimal):

  • Upload CSV file via LiveView file upload
  • Parse CSV with bilingual header support for core member fields (English/German)
  • Auto-detect delimiter (; or ,) using header recognition
  • Map CSV columns to core member fields (first_name, last_name, email, phone_number, street, postal_code, city)
  • Validate each row (required fields: first_name, last_name, email)
  • Create members via Ash resource (one-by-one, no background jobs, processed in chunks of 200 rows via LiveView messages)
  • Display import results: success count, error count, and error details
  • Provide static CSV templates (EN/DE)

Optional Enhancement (v1.1 - Last Issue):

  • Custom field import (if time permits, otherwise defer to v2)

Key Constraints (v1):

  • Admin-only feature
  • No upsert (create only)
  • No deduplication (duplicate emails fail and show as errors)
  • No mapping wizard (fixed header mapping via bilingual variants)
  • No background jobs (progress via LiveView handle_info)
  • Best-effort import (row-by-row, no rollback)
  • UI-only error display (no error CSV export)
  • Safety limits (10 MB, 1,000 rows, chunks of 200)

Out of Scope (v1)

Deferred to Future Versions:

  • Upsert/update existing members
  • Advanced deduplication strategies
  • Column mapping wizard UI
  • Background job processing (Oban/GenStage)
  • Transactional all-or-nothing import
  • Error CSV export/download
  • ⚠️ Custom field import (optional, last issue - defer to v2 if scope is tight)
  • Batch validation preview before import
  • Date/boolean field parsing
  • Dynamic template generation
  • Import history/audit log
  • Import templates for other entities

UX Flow

Access & Location

Entry Point:

  • Location: Global Settings page (/settings)
  • UI Element: New section "Import Members (CSV)" below "Custom Fields" section
  • Access Control: Admin-only (enforced at LiveView event level, not entire /settings route)

User Journey

  1. Navigate to Global Settings
  2. Access Import Section
    • Upload area (drag & drop or file picker)
    • Template download links (English / German)
    • Help text explaining CSV format
  3. Download Template (Optional)
  4. Prepare CSV File
  5. Upload CSV
  6. Start Import
    • Runs server-side via LiveView messages (may take up to ~30 seconds for large files)
  7. View Results
    • Success count
    • Error count
    • First 50 errors, each with:
      • CSV line number (header is line 1, first data record begins at line 2)
      • Error message
      • Field name (if applicable)

Error Handling

  • File too large: Flash error before upload starts
  • Too many rows: Flash error before import starts
  • Invalid CSV format: Error shown in results
  • Partial success: Results show both success and error counts

CSV Specification

Delimiter

Recommended: Semicolon (;)
Supported: ; and ,

Auto-Detection (Header Recognition):

  • Remove UTF-8 BOM first
  • Extract header record and try parsing with both delimiters
  • For each delimiter, count how many recognized headers are present (via normalized variants)
  • Choose delimiter with higher recognition; prefer ; if tied
  • If neither yields recognized headers, default to ;

Quoting Rules

  • Fields may be quoted with double quotes (")
  • Escaped quotes: "" inside quoted field represents a single "
  • v1 assumption: CSV records do not contain embedded newlines inside quoted fields. (If they do, parsing may fail or line numbers may be inaccurate.)

Column Headers

v1 Supported Fields (Core Member Fields Only):

  • first_name / Vorname (required)
  • last_name / Nachname (required)
  • email / E-Mail (required)
  • phone_number / Telefon (optional)
  • street / Straße (optional)
  • postal_code / PLZ / Postleitzahl (optional)
  • city / Stadt (optional)

Member Field Header Mapping:

Canonical Field English Variants German Variants
first_name first_name, firstname Vorname, vorname
last_name last_name, lastname, surname Nachname, nachname, Familienname
email email, e-mail, e_mail E-Mail, e-mail, e_mail
phone_number phone_number, phone, telephone Telefon, telefon
street street, address Straße, strasse, Strasse
postal_code postal_code, zip, postcode PLZ, plz, Postleitzahl, postleitzahl
city city, town Stadt, stadt, Ort

Header Normalization (used consistently for both input headers AND mapping variants):

  • Trim whitespace
  • Convert to lowercase
  • Normalize Unicode: ßss (e.g., Straßestrasse)
  • Replace hyphens/whitespace with underscores: E-Maile_mail, phone numberphone_number
  • Collapse multiple underscores: e__maile_mail
  • Case-insensitive matching

Unknown columns: ignored (no error)

Required fields: first_name, last_name, email

CSV Template Files

Location:

  • priv/static/templates/member_import_en.csv
  • priv/static/templates/member_import_de.csv

Content:

  • Header row with required + common optional fields
  • One example row
  • Uses semicolon delimiter (;)
  • UTF-8 encoding with BOM (Excel compatibility)

Template Access:

  • Templates are static files in priv/static/templates/
  • Served at:
    • /templates/member_import_en.csv
    • /templates/member_import_de.csv
  • In LiveView, link them using Phoenix static path helpers (e.g. ~p or Routes.static_path/2, depending on Phoenix version).

File Limits

  • Max file size: 10 MB
  • Max rows: 1,000 rows (excluding header)
  • Processing: chunks of 200 (via LiveView messages)
  • Encoding: UTF-8 (BOM handled)

Technical Design Notes

Architecture Overview

┌─────────────────┐
│  LiveView UI    │  (GlobalSettingsLive or component)
│  - Upload area  │
│  - Progress     │
│  - Results      │
└────────┬────────┘
         │ prepare
         ▼
┌─────────────────────────────┐
│ Import Service              │  (Mv.Membership.Import.MemberCSV)
│ - parse + map + limit checks│  -> returns import_state
│ - process_chunk(chunk)      │  -> returns chunk results
└────────┬────────────────────┘
         │ create
         ▼
┌─────────────────┐
│  Ash Resource   │  (Mv.Membership.Member)
│  - Create       │
└─────────────────┘

Technology Stack

  • Phoenix LiveView: file upload via allow_upload/3
  • NimbleCSV: CSV parsing (add explicit dependency if missing)
  • Ash Resource: member creation via Membership.create_member/1
  • Gettext: bilingual UI/error messages

Module Structure

New Modules:

  • lib/mv/membership/import/member_csv.ex - import orchestration + chunk processing
  • lib/mv/membership/import/csv_parser.ex - delimiter detection + parsing + BOM handling
  • lib/mv/membership/import/header_mapper.ex - normalization + header mapping

Modified Modules:

  • lib/mv_web/live/global_settings_live.ex - render import section, handle upload/events/messages

Data Flow

  1. Upload: LiveView receives file via allow_upload
  2. Consume: consume_uploaded_entries/3 reads file content
  3. Prepare: MemberCSV.prepare/2
    • Strip BOM
    • Detect delimiter (header recognition)
    • Parse header + rows
    • Map headers to canonical fields
    • Early abort if required headers missing
    • Row count check
    • Return import_state containing chunks and metadata
  4. Process: LiveView drives chunk processing via handle_info
    • For each chunk: validate + create + collect errors
  5. Results: LiveView shows progress + final summary

Types & Key Consistency

  • Raw CSV parsing: returns headers as list of strings, and rows with csv line numbers
  • Header mapping: operates on normalized strings; mapping table variants are normalized once
  • Ash attrs: built as atom-keyed map (%{first_name: ..., ...})

Error Model

%{
  csv_line_number: 5,       # physical line number in the CSV file
  field: :email,            # optional
  message: "is not a valid email"
}

CSV Line Numbers (Important)

To keep error reporting user-friendly and accurate, row errors must reference the physical line number in the original file, even if empty lines are skipped.

Design decision: the parser returns rows as:

rows :: [{csv_line_number :: pos_integer(), row_map :: map()}]

Downstream logic must not recompute line numbers from row indexes.

Authorization

Enforcement points:

  1. LiveView event level: check admin permission in handle_event("start_import", ...)
  2. UI level: render import section only for admin users
  3. Static templates: public assets (no authorization needed)

Use Mv.Authorization.PermissionSets (preferred) instead of hard-coded string checks where possible.

Safety Limits

  • File size enforced by allow_upload (max_file_size)
  • Row count enforced in MemberCSV.prepare/2 before processing starts
  • Chunking is done via LiveView handle_info loop (sequential, cooperative scheduling)

Implementation Issues

Issue #1: CSV Specification & Static Template Files

Dependencies: None

Goal: Define CSV contract and add static templates.

Tasks:

  • Finalize header mapping variants
  • Document normalization rules
  • Document delimiter detection strategy
  • Create templates in priv/static/templates/ (UTF-8 with BOM)
  • Document template URLs and how to link them from LiveView
  • Document line number semantics (physical CSV line numbers)

Definition of Done:

  • Templates open cleanly in Excel/LibreOffice
  • CSV spec section complete

Issue #2: Import Service Module Skeleton

Dependencies: None

Goal: Create service API and error types.

API (recommended):

  • prepare/2 — parse + map + limit checks, returns import_state
  • process_chunk/3 — process one chunk (pure-ish), returns per-chunk results

Tasks:

  • Create lib/mv/membership/import/member_csv.ex
  • Define public function: prepare/2 (file_content, opts \\ [])
  • Define public function: process_chunk/3 (chunk_rows_with_lines, column_map, opts \\ [])
  • Define error struct: %MemberCSV.Error{csv_line_number: integer, field: atom | nil, message: String.t}
  • Document module + API

Issue #3: CSV Parsing + Delimiter Auto-Detection + BOM Handling

Dependencies: Issue #2

Goal: Parse CSV robustly with correct delimiter detection and BOM handling.

Tasks:

  • Verify/add NimbleCSV dependency ({:nimble_csv, "~> 1.0"})
  • Create lib/mv/membership/import/csv_parser.ex
  • Implement strip_bom/1 and apply it before any header handling
  • Handle \r\n and \n line endings (trim \r on header record)
  • Detect delimiter via header recognition (try ; and ,)
  • Parse CSV and return:
    • headers :: [String.t()]
    • rows :: [{csv_line_number, [String.t()]}] or directly [{csv_line_number, row_map}]
  • Skip completely empty records (but preserve correct physical line numbers)
  • Return {:ok, headers, rows} or {:error, reason}

Definition of Done:

  • BOM handling works (Excel exports)
  • Delimiter detection works reliably
  • Rows carry correct csv_line_number

Issue #4: Header Normalization + Per-Header Mapping (No Language Detection)

Dependencies: Issue #3

Goal: Map each header individually to canonical fields (normalized comparison).

Tasks:

  • Create lib/mv/membership/import/header_mapper.ex
  • Implement normalize_header/1
  • Normalize mapping variants once and compare normalized strings
  • Build column_map (canonical field -> column index)
  • Early abort if required headers missing (first_name, last_name, email)
  • Ignore unknown columns

Definition of Done:

  • English/German headers map correctly
  • Missing required columns fails fast

Issue #5: Validation (Required Fields) + Error Formatting

Dependencies: Issue #4

Goal: Validate each row and return structured, translatable errors.

Tasks:

  • Implement validate_row/3 (row_map, csv_line_number, opts)
  • Required field presence (first_name, last_name, email)
  • Email format validation (EctoCommons.EmailValidator)
  • Trim values before validation
  • Gettext-backed error messages

Issue #6: Persistence via Ash Create + Per-Row Error Capture (Chunked Processing)

Dependencies: Issue #5

Goal: Create members and capture errors per row with correct CSV line numbers.

Tasks:

  • Implement process_chunk/3 in service:
    • Input: [{csv_line_number, row_map}]
    • Validate + create sequentially
    • Collect counts + first 50 errors (per import overall; LiveView enforces cap across chunks)
  • Implement Ash error formatter helper:
    • Convert Ash.Error.Invalid into %MemberCSV.Error{}
    • Prefer field-level errors where possible (attach field atom)
    • Handle unique email constraint error as user-friendly message
  • Map row_map to Ash attrs (%{first_name: ..., ...})

Important: Do not recompute line numbers in this layer—use the ones provided by the parser.


Dependencies: Issue #6

Goal: UI section with upload, progress, results, and template links.

Tasks:

  • Render import section only for admins
  • Configure allow_upload/3:
    • .csv only, max_entries: 1, max_file_size: 10MB, auto_upload: false
  • handle_event("start_import", ...):
    • Admin permission check
    • Consume upload -> read file content
    • Call MemberCSV.prepare/2
    • Store import_state in assigns (chunks + column_map + metadata)
    • Initialize progress assigns
    • send(self(), {:process_chunk, 0})
  • handle_info({:process_chunk, idx}, socket):
    • Fetch chunk from import_state
    • Call MemberCSV.process_chunk/3
    • Merge counts/errors into progress assigns (cap errors at 50 overall)
    • Schedule next chunk (or finish and show results)
  • Results UI:
    • Success count
    • Failure count
    • Error list (line number + message + field)

Template links:

  • Link /templates/member_import_en.csv and /templates/member_import_de.csv via Phoenix static path helpers.

Issue #8: Authorization + Limits

Dependencies: None (can be parallelized)

Goal: Ensure admin-only access and enforce limits.

Tasks:

  • Admin check in start import event handler
  • File size enforced in upload config
  • Row limit enforced in MemberCSV.prepare/2 (max_rows from config)
  • Configuration:
    config :mv, csv_import: [
      max_file_size_mb: 10,
      max_rows: 1000
    ]
    

Issue #9: End-to-End LiveView Tests + Fixtures

Dependencies: Issue #7 and #8

Tasks:

  • Fixtures:
    • valid EN/DE
    • invalid
    • too many rows (1,001)
    • BOM + ; delimiter fixture
    • fixture with empty line(s) to validate correct line numbers
  • LiveView tests:
    • admin sees section, non-admin does not
    • upload + start import
    • success + error rendering
    • row limit + file size errors

Issue #10: Documentation Polish (Inline Help Text + Docs)

Dependencies: Issue #9

Tasks:

  • UI help text + translations
  • CHANGELOG entry
  • Ensure moduledocs/docs

Issue #11: Custom Field Import (Optional - v1.1)

Dependencies: Issue #10
Status: Optional

(unchanged — intentionally deferred)


Rollout & Risks

Rollout Strategy

  • Dev → Staging → Production (with anonymized real-world CSV tests)

Risks & Mitigations

Risk Impact Likelihood Mitigation
Large import timeout High Medium 10 MB + 1,000 rows, chunking via handle_info
Encoding issues Medium Medium BOM stripping, templates with BOM
Invalid CSV format Medium High Clear errors + templates
Duplicate emails Low High Ash constraint error -> user-friendly message
Performance (no background jobs) Medium Low Small limits, sequential chunk processing
Admin access bypass High Low Event-level auth + UI hiding
Data corruption High Low Per-row validation + best-effort

Appendix

Module File Structure

lib/
├── mv/
│   └── membership/
│       └── import/
│           ├── member_csv.ex          # prepare + process_chunk
│           ├── csv_parser.ex          # delimiter detection + parsing + BOM handling
│           └── header_mapper.ex       # normalization + header mapping
└── mv_web/
    └── live/
        └── global_settings_live.ex    # add import section + LV message loop

priv/
└── static/
    └── templates/
        ├── member_import_en.csv
        └── member_import_de.csv

test/
├── mv/
│   └── membership/
│       └── import/
│           ├── member_csv_test.exs
│           ├── csv_parser_test.exs
│           └── header_mapper_test.exs
└── fixtures/
    ├── member_import_en.csv
    ├── member_import_de.csv
    ├── member_import_invalid.csv
    ├── member_import_large.csv
    └── member_import_empty_lines.csv

Example Usage (LiveView)

def handle_event("start_import", _params, socket) do
  assert_admin!(socket.assigns.current_user)

  [{_name, content}] =
    consume_uploaded_entries(socket, :csv_file, fn %{path: path}, _entry ->
      {:ok, File.read!(path)}
    end)

  case Mv.Membership.Import.MemberCSV.prepare(content) do
    {:ok, import_state} ->
      socket =
        socket
        |> assign(:import_state, import_state)
        |> assign(:import_progress, %{processed: 0, inserted: 0, failed: 0, errors: []})
        |> assign(:importing?, true)

      send(self(), {:process_chunk, 0})
      {:noreply, socket}

    {:error, reason} ->
      {:noreply, put_flash(socket, :error, reason)}
  end
end

def handle_info({:process_chunk, idx}, socket) do
  %{chunks: chunks, column_map: column_map} = socket.assigns.import_state

  case Enum.at(chunks, idx) do
    nil ->
      {:noreply, assign(socket, importing?: false)}

    chunk_rows_with_lines ->
      {:ok, chunk_result} =
        Mv.Membership.Import.MemberCSV.process_chunk(chunk_rows_with_lines, column_map)

      socket = merge_progress(socket, chunk_result) # caps errors at 50 overall

      send(self(), {:process_chunk, idx + 1})
      {:noreply, socket}
  end
end

End of Implementation Plan