mitgliederverwaltung/docs/csv-member-import-v1.md

172 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CSV Member Import
Reference for how the CSV member import actually behaves. The end-to-end
LiveView test (`test/mv_web/live/import_live_test.exs`) and future maintenance
depend on the rules documented here.
**Status:** implemented (backend + LiveView UI).
Implementation:
- `lib/mv/membership/import/csv_parser.ex` — BOM stripping, delimiter detection, physical line numbering
- `lib/mv/membership/import/header_mapper.ex` — header normalization + column mapping
- `lib/mv/membership/import/column_resolver.ex` — read-only resolution of groups + fee-type columns (preview)
- `lib/mv/membership/import/member_csv.ex``prepare/2`, `process_chunk/4`, validation, member creation
- `lib/mv/membership/import/import_runner.ex` — orchestration glue
- `lib/mv_web/live/import_live.ex` (+ `import_live/components.ex`) — UI, state machine, chunk driving
- `lib/mv_web/controllers/import_template_controller.ex` — on-the-fly template generation
## Scope
Admin-only bulk creation of members from an uploaded CSV.
- **Create only** — no upsert/update of existing members.
- **No deduplication** — a duplicate email fails its row (unique constraint) and is reported as an error.
- **Best-effort, row-by-row** — no transactional rollback; a failed row does not abort the import.
- **No background jobs** — progress is driven via LiveView `handle_info` chunk messages.
- **Errors shown in UI only** — no error-CSV export.
Out of scope: upsert, mapping wizard, transactional all-or-nothing, error export, import history/audit.
## UI Flow
- **Route:** `/admin/import` (LiveView `MvWeb.ImportLive`). Template downloads:
`/admin/import/template/en` and `/admin/import/template/de` (dynamic controller, not static files).
- **Authorization:** requires `can?(:create, Mv.Membership.Member)`. Non-admins are
redirected with a "don't have permission" flash. The import section, the template
controller, and the `start_import` event all enforce this.
- **Upload:** `allow_upload(:csv_file, accept: .csv, max_entries: 1, auto_upload: true)`.
File size limit enforced by `max_file_size`.
- **State machine** (`@import_status`): `idle → preview → running → done|error`.
- **start_import** parses + resolves the file and transitions to **preview**. This step
is **read-only**: no members are created yet. The preview shows the column mapping,
sample rows, groups that exist vs. would be created, and fee-type/unknown-column warnings.
- **confirm_import** begins processing and creates members chunk by chunk.
- **Results:** success count, failure count, error list (each with CSV line number, message,
optional field), warnings, and a truncation notice when errors exceed the cap.
## Limits
- **Max file size:** configurable via `config :mv, csv_import: [max_file_size_mb: ...]` (enforced by `allow_upload`).
- **Max rows:** configurable via `config :mv, csv_import: [max_rows: ...]`, default **1000**, excluding header. Enforced in `MemberCSV.prepare/2`; exceeding it yields an error containing `"exceeds"`.
- **Chunk size:** 200 rows per chunk.
- **Error cap:** 50 errors collected per import overall (`failed` count stays accurate; `errors_truncated?` flag set when exceeded).
## Parsing (`CsvParser.parse/1`)
- Content must be **valid UTF-8** (else error). Empty content / empty header row are errors.
- **UTF-8 BOM is stripped first**, before any header handling.
- Line endings normalized: `\r\n`, `\r`, `\n` all handled.
- **Delimiter auto-detection:** parse the header with both `;` and `,` parsers (NimbleCSV,
quote-aware), count non-empty fields each yields, pick the higher; **`;` wins ties**;
default `;`.
- **Quoting:** double-quote quoting; `""` inside a quoted field is a literal `"`. Newlines
inside quoted fields are supported — the record keeps its **start** line number.
- **Physical line numbers:** rows are returned as `{csv_line_number, values}` where the line
number is the physical 1-based line in the file (header is line 1, first data row is line 2).
**Empty lines are skipped but do not shift numbering** — downstream code must use the
parser's line numbers, never recompute from row index. (Test asserts an invalid row after a
skipped empty line still reports its true physical line, e.g. `Line 4`.)
- Completely empty rows are skipped. An unparsable row produces an error naming its line number.
## Header Mapping & Normalization (`HeaderMapper`)
**`normalize_header/1`** (applied identically to incoming headers, mapping variants, custom
field names, group names, and fee-type names):
1. trim, lowercase
2. transliterate German chars: `ß → ss`, `ä → ae`, `ö → oe`, `ü → ue` (and uppercase forms)
3. unify hyphen variants (en dash U+2013, em dash U+2014, minus U+2212 → `-`)
4. punctuation to spaces: `_`, `()[]{}`, `/`, `\` → space
5. **remove all whitespace** (so `first name` == `firstname`)
6. final trim
Matching is on the fully normalized string.
**Required field:** `email`. Missing it aborts `prepare` with a "Missing required header" error.
**Unknown member-field columns:** ignored (no error). If an unknown column looks like it
could be a custom field that does not exist, a **warning** is emitted (import continues).
**Duplicate headers** mapping to the same canonical field (or same custom field) are an error.
### Supported member fields and header variants
Source of truth is `@member_field_variants_raw` in `header_mapper.ex`. Variants below are
illustrative; matching is via normalization, so casing/hyphen/whitespace differences all collapse.
| Canonical | Example accepted headers (EN / DE) | Notes |
|---|---|---|
| `email` (required) | email, e-mail, e_mail, mail, e-mail-adresse / E-Mail | |
| `first_name` | first name, firstname / Vorname | |
| `last_name` | last name, lastname, surname / Nachname, Familienname | |
| `join_date` | join date / Beitrittsdatum | ISO-8601 date |
| `exit_date` | exit date / Austrittsdatum | ISO-8601 date |
| `notes` | notes / Notizen, Bemerkungen | |
| `street` | street, address / Straße, Strasse | |
| `house_number` | house number, house no / Hausnummer, Nr, Nr., Nummer | |
| `postal_code` | postal code, zip, postcode / PLZ, Postleitzahl | |
| `city` | city, town / Stadt, Ort | |
| `country` | country / Land, Staat | |
| `membership_fee_start_date` | membership fee start date, fee start / Beitragsbeginn | ISO-8601 date |
### Special relationship columns
- **groups** (headers `Groups` / `Gruppen` / `Gruppe`) — comma-separated group names. Names
matched case-insensitively against existing groups; **missing groups are auto-created** during
processing. A group-assignment failure fails that row (the member was already created).
- **membership_fee_type** (headers `Fee Type`, `fee_type`, `membership_fee_type` / `Beitragsart`)
— name matched to an existing `MembershipFeeType`. **Empty cell → default fee type** (no warning).
**Matched name → that fee type.** **Unmatched name → default fee type + warning** naming the value.
These columns are resolved against the DB read-only in `prepare` (`ColumnResolver`) for the
preview; the actual writes happen in `process_chunk`.
### Fields not importable (explicitly ignored)
- **membership_fee_status** — computed from fee cycles; not stored. Fee-status header variants
(`Membership Fee Status`, `Bezahlstatus`, `Mitgliedsbeitragsstatus`) and the DE export label
`Startdatum Mitgliedsbeitrag` are placed in the `ignored` list and never mapped. (The UI notice
names `Groups`/`Gruppen`, `Fee Type`/`Beitragsart`, and the always-ignored `Bezahlstatus`.)
## Custom Fields
- Custom field columns are matched by the custom field **name** (not slug), using the same
normalization. Member fields take priority on a name collision.
- **Custom fields must exist in Mila before import.** Unknown custom-field columns are ignored
with a warning; the import still runs.
- Empty custom-field cells create no value. Values are trimmed; type-validated per the custom
field's `value_type`:
- **string** — any text (trimmed).
- **integer** — must parse fully (`Integer.parse` with no remainder); e.g. `42`, `-10`.
- **boolean** — case-insensitive `true/false`, `1/0`, `yes/no`, `ja/nein`.
- **date** — ISO-8601 `YYYY-MM-DD`.
- **email** — validated with `EctoCommons.EmailValidator` (same checks as member email).
- A value failing type validation fails the row. Error message format:
`custom_field: <name> expected <type>, got: <value>` (type label is the human-readable
`FieldTypes.label/1`, with format hints for boolean/date).
## Validation & Member Creation (`process_chunk/4` → `process_row`)
Per row: validate → create member → create custom-field values → assign groups. Sequential.
- **Email** is required and format-validated (`EctoCommons.EmailValidator`, `Mv.Constants.email_validator_checks()`) on a trimmed value. All string member values are trimmed.
- **Date fields** (`join_date`, `exit_date`, `membership_fee_start_date`): empty/blank strings are converted to `nil` so Ash accepts them.
- Member created via `Mv.Membership.create_member/2`. Custom field values are passed as
`custom_field_values` (Ash union `_union_type`/`_union_value` format), omitted when none.
- **Errors** are `%MemberCSV.Error{csv_line_number, field, message}`:
- `csv_line_number` is the physical line (1-based); never recomputed in this layer.
- Validation errors get `field: :email`; Ash errors prefer the field-level error.
- **Duplicate email** (unique constraint) is surfaced as a friendly
`"email <addr> has already been taken"` message.
- **Error capping** (`max_errors`, default 50, tracked across chunks via `existing_error_count`):
once the cap is hit, no further errors are collected but **all rows are still processed** and
the `failed` count stays accurate; `errors_truncated?` is set and the UI shows a truncation notice.
## Templates (`ImportTemplateController`)
- Generated on the fly (not static files), gated by `can?(:create, Member)`.
- One header row: standard member columns (localized EN/DE) + `Groups`/`Gruppen` +
`Fee Type`/`Beitragsart` + **every existing custom field name** appended, then one example row.
- Semicolon-delimited, RFC-4180 quoting; fields run through `MembersCSV.safe_cell/1` to
neutralize spreadsheet formula injection (e.g. a custom-field name like `=HYPERLINK(...)`).