172 lines
10 KiB
Markdown
172 lines
10 KiB
Markdown
# CSV Member Import
|
||
|
||
Reference for how the CSV member import actually behaves. The end-to-end
|
||
LiveView test (`test/mv_web/live/import_live_test.exs`) and future maintenance
|
||
depend on the rules documented here.
|
||
|
||
**Status:** implemented (backend + LiveView UI).
|
||
|
||
Implementation:
|
||
|
||
- `lib/mv/membership/import/csv_parser.ex` — BOM stripping, delimiter detection, physical line numbering
|
||
- `lib/mv/membership/import/header_mapper.ex` — header normalization + column mapping
|
||
- `lib/mv/membership/import/column_resolver.ex` — read-only resolution of groups + fee-type columns (preview)
|
||
- `lib/mv/membership/import/member_csv.ex` — `prepare/2`, `process_chunk/4`, validation, member creation
|
||
- `lib/mv/membership/import/import_runner.ex` — orchestration glue
|
||
- `lib/mv_web/live/import_live.ex` (+ `import_live/components.ex`) — UI, state machine, chunk driving
|
||
- `lib/mv_web/controllers/import_template_controller.ex` — on-the-fly template generation
|
||
|
||
## Scope
|
||
|
||
Admin-only bulk creation of members from an uploaded CSV.
|
||
|
||
- **Create only** — no upsert/update of existing members.
|
||
- **No deduplication** — a duplicate email fails its row (unique constraint) and is reported as an error.
|
||
- **Best-effort, row-by-row** — no transactional rollback; a failed row does not abort the import.
|
||
- **No background jobs** — progress is driven via LiveView `handle_info` chunk messages.
|
||
- **Errors shown in UI only** — no error-CSV export.
|
||
|
||
Out of scope: upsert, mapping wizard, transactional all-or-nothing, error export, import history/audit.
|
||
|
||
## UI Flow
|
||
|
||
- **Route:** `/admin/import` (LiveView `MvWeb.ImportLive`). Template downloads:
|
||
`/admin/import/template/en` and `/admin/import/template/de` (dynamic controller, not static files).
|
||
- **Authorization:** requires `can?(:create, Mv.Membership.Member)`. Non-admins are
|
||
redirected with a "don't have permission" flash. The import section, the template
|
||
controller, and the `start_import` event all enforce this.
|
||
- **Upload:** `allow_upload(:csv_file, accept: .csv, max_entries: 1, auto_upload: true)`.
|
||
File size limit enforced by `max_file_size`.
|
||
- **State machine** (`@import_status`): `idle → preview → running → done|error`.
|
||
- **start_import** parses + resolves the file and transitions to **preview**. This step
|
||
is **read-only**: no members are created yet. The preview shows the column mapping,
|
||
sample rows, groups that exist vs. would be created, and fee-type/unknown-column warnings.
|
||
- **confirm_import** begins processing and creates members chunk by chunk.
|
||
- **Results:** success count, failure count, error list (each with CSV line number, message,
|
||
optional field), warnings, and a truncation notice when errors exceed the cap.
|
||
|
||
## Limits
|
||
|
||
- **Max file size:** configurable via `config :mv, csv_import: [max_file_size_mb: ...]` (enforced by `allow_upload`).
|
||
- **Max rows:** configurable via `config :mv, csv_import: [max_rows: ...]`, default **1000**, excluding header. Enforced in `MemberCSV.prepare/2`; exceeding it yields an error containing `"exceeds"`.
|
||
- **Chunk size:** 200 rows per chunk.
|
||
- **Error cap:** 50 errors collected per import overall (`failed` count stays accurate; `errors_truncated?` flag set when exceeded).
|
||
|
||
## Parsing (`CsvParser.parse/1`)
|
||
|
||
- Content must be **valid UTF-8** (else error). Empty content / empty header row are errors.
|
||
- **UTF-8 BOM is stripped first**, before any header handling.
|
||
- Line endings normalized: `\r\n`, `\r`, `\n` all handled.
|
||
- **Delimiter auto-detection:** parse the header with both `;` and `,` parsers (NimbleCSV,
|
||
quote-aware), count non-empty fields each yields, pick the higher; **`;` wins ties**;
|
||
default `;`.
|
||
- **Quoting:** double-quote quoting; `""` inside a quoted field is a literal `"`. Newlines
|
||
inside quoted fields are supported — the record keeps its **start** line number.
|
||
- **Physical line numbers:** rows are returned as `{csv_line_number, values}` where the line
|
||
number is the physical 1-based line in the file (header is line 1, first data row is line 2).
|
||
**Empty lines are skipped but do not shift numbering** — downstream code must use the
|
||
parser's line numbers, never recompute from row index. (Test asserts an invalid row after a
|
||
skipped empty line still reports its true physical line, e.g. `Line 4`.)
|
||
- Completely empty rows are skipped. An unparsable row produces an error naming its line number.
|
||
|
||
## Header Mapping & Normalization (`HeaderMapper`)
|
||
|
||
**`normalize_header/1`** (applied identically to incoming headers, mapping variants, custom
|
||
field names, group names, and fee-type names):
|
||
|
||
1. trim, lowercase
|
||
2. transliterate German chars: `ß → ss`, `ä → ae`, `ö → oe`, `ü → ue` (and uppercase forms)
|
||
3. unify hyphen variants (en dash U+2013, em dash U+2014, minus U+2212 → `-`)
|
||
4. punctuation to spaces: `_`, `()[]{}`, `/`, `\` → space
|
||
5. **remove all whitespace** (so `first name` == `firstname`)
|
||
6. final trim
|
||
|
||
Matching is on the fully normalized string.
|
||
|
||
**Required field:** `email`. Missing it aborts `prepare` with a "Missing required header" error.
|
||
|
||
**Unknown member-field columns:** ignored (no error). If an unknown column looks like it
|
||
could be a custom field that does not exist, a **warning** is emitted (import continues).
|
||
|
||
**Duplicate headers** mapping to the same canonical field (or same custom field) are an error.
|
||
|
||
### Supported member fields and header variants
|
||
|
||
Source of truth is `@member_field_variants_raw` in `header_mapper.ex`. Variants below are
|
||
illustrative; matching is via normalization, so casing/hyphen/whitespace differences all collapse.
|
||
|
||
| Canonical | Example accepted headers (EN / DE) | Notes |
|
||
|---|---|---|
|
||
| `email` (required) | email, e-mail, e_mail, mail, e-mail-adresse / E-Mail | |
|
||
| `first_name` | first name, firstname / Vorname | |
|
||
| `last_name` | last name, lastname, surname / Nachname, Familienname | |
|
||
| `join_date` | join date / Beitrittsdatum | ISO-8601 date |
|
||
| `exit_date` | exit date / Austrittsdatum | ISO-8601 date |
|
||
| `notes` | notes / Notizen, Bemerkungen | |
|
||
| `street` | street, address / Straße, Strasse | |
|
||
| `house_number` | house number, house no / Hausnummer, Nr, Nr., Nummer | |
|
||
| `postal_code` | postal code, zip, postcode / PLZ, Postleitzahl | |
|
||
| `city` | city, town / Stadt, Ort | |
|
||
| `country` | country / Land, Staat | |
|
||
| `membership_fee_start_date` | membership fee start date, fee start / Beitragsbeginn | ISO-8601 date |
|
||
|
||
### Special relationship columns
|
||
|
||
- **groups** (headers `Groups` / `Gruppen` / `Gruppe`) — comma-separated group names. Names
|
||
matched case-insensitively against existing groups; **missing groups are auto-created** during
|
||
processing. A group-assignment failure fails that row (the member was already created).
|
||
- **membership_fee_type** (headers `Fee Type`, `fee_type`, `membership_fee_type` / `Beitragsart`)
|
||
— name matched to an existing `MembershipFeeType`. **Empty cell → default fee type** (no warning).
|
||
**Matched name → that fee type.** **Unmatched name → default fee type + warning** naming the value.
|
||
|
||
These columns are resolved against the DB read-only in `prepare` (`ColumnResolver`) for the
|
||
preview; the actual writes happen in `process_chunk`.
|
||
|
||
### Fields not importable (explicitly ignored)
|
||
|
||
- **membership_fee_status** — computed from fee cycles; not stored. Fee-status header variants
|
||
(`Membership Fee Status`, `Bezahlstatus`, `Mitgliedsbeitragsstatus`) and the DE export label
|
||
`Startdatum Mitgliedsbeitrag` are placed in the `ignored` list and never mapped. (The UI notice
|
||
names `Groups`/`Gruppen`, `Fee Type`/`Beitragsart`, and the always-ignored `Bezahlstatus`.)
|
||
|
||
## Custom Fields
|
||
|
||
- Custom field columns are matched by the custom field **name** (not slug), using the same
|
||
normalization. Member fields take priority on a name collision.
|
||
- **Custom fields must exist in Mila before import.** Unknown custom-field columns are ignored
|
||
with a warning; the import still runs.
|
||
- Empty custom-field cells create no value. Values are trimmed; type-validated per the custom
|
||
field's `value_type`:
|
||
- **string** — any text (trimmed).
|
||
- **integer** — must parse fully (`Integer.parse` with no remainder); e.g. `42`, `-10`.
|
||
- **boolean** — case-insensitive `true/false`, `1/0`, `yes/no`, `ja/nein`.
|
||
- **date** — ISO-8601 `YYYY-MM-DD`.
|
||
- **email** — validated with `EctoCommons.EmailValidator` (same checks as member email).
|
||
- A value failing type validation fails the row. Error message format:
|
||
`custom_field: <name> – expected <type>, got: <value>` (type label is the human-readable
|
||
`FieldTypes.label/1`, with format hints for boolean/date).
|
||
|
||
## Validation & Member Creation (`process_chunk/4` → `process_row`)
|
||
|
||
Per row: validate → create member → create custom-field values → assign groups. Sequential.
|
||
|
||
- **Email** is required and format-validated (`EctoCommons.EmailValidator`, `Mv.Constants.email_validator_checks()`) on a trimmed value. All string member values are trimmed.
|
||
- **Date fields** (`join_date`, `exit_date`, `membership_fee_start_date`): empty/blank strings are converted to `nil` so Ash accepts them.
|
||
- Member created via `Mv.Membership.create_member/2`. Custom field values are passed as
|
||
`custom_field_values` (Ash union `_union_type`/`_union_value` format), omitted when none.
|
||
- **Errors** are `%MemberCSV.Error{csv_line_number, field, message}`:
|
||
- `csv_line_number` is the physical line (1-based); never recomputed in this layer.
|
||
- Validation errors get `field: :email`; Ash errors prefer the field-level error.
|
||
- **Duplicate email** (unique constraint) is surfaced as a friendly
|
||
`"email <addr> has already been taken"` message.
|
||
- **Error capping** (`max_errors`, default 50, tracked across chunks via `existing_error_count`):
|
||
once the cap is hit, no further errors are collected but **all rows are still processed** and
|
||
the `failed` count stays accurate; `errors_truncated?` is set and the UI shows a truncation notice.
|
||
|
||
## Templates (`ImportTemplateController`)
|
||
|
||
- Generated on the fly (not static files), gated by `can?(:create, Member)`.
|
||
- One header row: standard member columns (localized EN/DE) + `Groups`/`Gruppen` +
|
||
`Fee Type`/`Beitragsart` + **every existing custom field name** appended, then one example row.
|
||
- Semicolon-delimited, RFC-4180 quoting; fields run through `MembersCSV.safe_cell/1` to
|
||
neutralize spreadsheet formula injection (e.g. a custom-field name like `=HYPERLINK(...)`).
|