mitgliederverwaltung/docs/csv-member-import-v1.md

10 KiB
Raw Permalink Blame History

CSV Member Import

Reference for how the CSV member import actually behaves. The end-to-end LiveView test (test/mv_web/live/import_live_test.exs) and future maintenance depend on the rules documented here.

Status: implemented (backend + LiveView UI).

Implementation:

  • lib/mv/membership/import/csv_parser.ex — BOM stripping, delimiter detection, physical line numbering
  • lib/mv/membership/import/header_mapper.ex — header normalization + column mapping
  • lib/mv/membership/import/column_resolver.ex — read-only resolution of groups + fee-type columns (preview)
  • lib/mv/membership/import/member_csv.exprepare/2, process_chunk/4, validation, member creation
  • lib/mv/membership/import/import_runner.ex — orchestration glue
  • lib/mv_web/live/import_live.ex (+ import_live/components.ex) — UI, state machine, chunk driving
  • lib/mv_web/controllers/import_template_controller.ex — on-the-fly template generation

Scope

Admin-only bulk creation of members from an uploaded CSV.

  • Create only — no upsert/update of existing members.
  • No deduplication — a duplicate email fails its row (unique constraint) and is reported as an error.
  • Best-effort, row-by-row — no transactional rollback; a failed row does not abort the import.
  • No background jobs — progress is driven via LiveView handle_info chunk messages.
  • Errors shown in UI only — no error-CSV export.

Out of scope: upsert, mapping wizard, transactional all-or-nothing, error export, import history/audit.

UI Flow

  • Route: /admin/import (LiveView MvWeb.ImportLive). Template downloads: /admin/import/template/en and /admin/import/template/de (dynamic controller, not static files).
  • Authorization: requires can?(:create, Mv.Membership.Member). Non-admins are redirected with a "don't have permission" flash. The import section, the template controller, and the start_import event all enforce this.
  • Upload: allow_upload(:csv_file, accept: .csv, max_entries: 1, auto_upload: true). File size limit enforced by max_file_size.
  • State machine (@import_status): idle → preview → running → done|error.
    • start_import parses + resolves the file and transitions to preview. This step is read-only: no members are created yet. The preview shows the column mapping, sample rows, groups that exist vs. would be created, and fee-type/unknown-column warnings.
    • confirm_import begins processing and creates members chunk by chunk.
  • Results: success count, failure count, error list (each with CSV line number, message, optional field), warnings, and a truncation notice when errors exceed the cap.

Limits

  • Max file size: configurable via config :mv, csv_import: [max_file_size_mb: ...] (enforced by allow_upload).
  • Max rows: configurable via config :mv, csv_import: [max_rows: ...], default 1000, excluding header. Enforced in MemberCSV.prepare/2; exceeding it yields an error containing "exceeds".
  • Chunk size: 200 rows per chunk.
  • Error cap: 50 errors collected per import overall (failed count stays accurate; errors_truncated? flag set when exceeded).

Parsing (CsvParser.parse/1)

  • Content must be valid UTF-8 (else error). Empty content / empty header row are errors.
  • UTF-8 BOM is stripped first, before any header handling.
  • Line endings normalized: \r\n, \r, \n all handled.
  • Delimiter auto-detection: parse the header with both ; and , parsers (NimbleCSV, quote-aware), count non-empty fields each yields, pick the higher; ; wins ties; default ;.
  • Quoting: double-quote quoting; "" inside a quoted field is a literal ". Newlines inside quoted fields are supported — the record keeps its start line number.
  • Physical line numbers: rows are returned as {csv_line_number, values} where the line number is the physical 1-based line in the file (header is line 1, first data row is line 2). Empty lines are skipped but do not shift numbering — downstream code must use the parser's line numbers, never recompute from row index. (Test asserts an invalid row after a skipped empty line still reports its true physical line, e.g. Line 4.)
  • Completely empty rows are skipped. An unparsable row produces an error naming its line number.

Header Mapping & Normalization (HeaderMapper)

normalize_header/1 (applied identically to incoming headers, mapping variants, custom field names, group names, and fee-type names):

  1. trim, lowercase
  2. transliterate German chars: ß → ss, ä → ae, ö → oe, ü → ue (and uppercase forms)
  3. unify hyphen variants (en dash U+2013, em dash U+2014, minus U+2212 → -)
  4. punctuation to spaces: _, ()[]{}, /, \ → space
  5. remove all whitespace (so first name == firstname)
  6. final trim

Matching is on the fully normalized string.

Required field: email. Missing it aborts prepare with a "Missing required header" error.

Unknown member-field columns: ignored (no error). If an unknown column looks like it could be a custom field that does not exist, a warning is emitted (import continues).

Duplicate headers mapping to the same canonical field (or same custom field) are an error.

Supported member fields and header variants

Source of truth is @member_field_variants_raw in header_mapper.ex. Variants below are illustrative; matching is via normalization, so casing/hyphen/whitespace differences all collapse.

Canonical Example accepted headers (EN / DE) Notes
email (required) email, e-mail, e_mail, mail, e-mail-adresse / E-Mail
first_name first name, firstname / Vorname
last_name last name, lastname, surname / Nachname, Familienname
join_date join date / Beitrittsdatum ISO-8601 date
exit_date exit date / Austrittsdatum ISO-8601 date
notes notes / Notizen, Bemerkungen
street street, address / Straße, Strasse
house_number house number, house no / Hausnummer, Nr, Nr., Nummer
postal_code postal code, zip, postcode / PLZ, Postleitzahl
city city, town / Stadt, Ort
country country / Land, Staat
membership_fee_start_date membership fee start date, fee start / Beitragsbeginn ISO-8601 date

Special relationship columns

  • groups (headers Groups / Gruppen / Gruppe) — comma-separated group names. Names matched case-insensitively against existing groups; missing groups are auto-created during processing. A group-assignment failure fails that row (the member was already created).
  • membership_fee_type (headers Fee Type, fee_type, membership_fee_type / Beitragsart) — name matched to an existing MembershipFeeType. Empty cell → default fee type (no warning). Matched name → that fee type. Unmatched name → default fee type + warning naming the value.

These columns are resolved against the DB read-only in prepare (ColumnResolver) for the preview; the actual writes happen in process_chunk.

Fields not importable (explicitly ignored)

  • membership_fee_status — computed from fee cycles; not stored. Fee-status header variants (Membership Fee Status, Bezahlstatus, Mitgliedsbeitragsstatus) and the DE export label Startdatum Mitgliedsbeitrag are placed in the ignored list and never mapped. (The UI notice names Groups/Gruppen, Fee Type/Beitragsart, and the always-ignored Bezahlstatus.)

Custom Fields

  • Custom field columns are matched by the custom field name (not slug), using the same normalization. Member fields take priority on a name collision.
  • Custom fields must exist in Mila before import. Unknown custom-field columns are ignored with a warning; the import still runs.
  • Empty custom-field cells create no value. Values are trimmed; type-validated per the custom field's value_type:
    • string — any text (trimmed).
    • integer — must parse fully (Integer.parse with no remainder); e.g. 42, -10.
    • boolean — case-insensitive true/false, 1/0, yes/no, ja/nein.
    • date — ISO-8601 YYYY-MM-DD.
    • email — validated with EctoCommons.EmailValidator (same checks as member email).
  • A value failing type validation fails the row. Error message format: custom_field: <name> expected <type>, got: <value> (type label is the human-readable FieldTypes.label/1, with format hints for boolean/date).

Validation & Member Creation (process_chunk/4process_row)

Per row: validate → create member → create custom-field values → assign groups. Sequential.

  • Email is required and format-validated (EctoCommons.EmailValidator, Mv.Constants.email_validator_checks()) on a trimmed value. All string member values are trimmed.
  • Date fields (join_date, exit_date, membership_fee_start_date): empty/blank strings are converted to nil so Ash accepts them.
  • Member created via Mv.Membership.create_member/2. Custom field values are passed as custom_field_values (Ash union _union_type/_union_value format), omitted when none.
  • Errors are %MemberCSV.Error{csv_line_number, field, message}:
    • csv_line_number is the physical line (1-based); never recomputed in this layer.
    • Validation errors get field: :email; Ash errors prefer the field-level error.
    • Duplicate email (unique constraint) is surfaced as a friendly "email <addr> has already been taken" message.
  • Error capping (max_errors, default 50, tracked across chunks via existing_error_count): once the cap is hit, no further errors are collected but all rows are still processed and the failed count stays accurate; errors_truncated? is set and the UI shows a truncation notice.

Templates (ImportTemplateController)

  • Generated on the fly (not static files), gated by can?(:create, Member).
  • One header row: standard member columns (localized EN/DE) + Groups/Gruppen + Fee Type/Beitragsart + every existing custom field name appended, then one example row.
  • Semicolon-delimited, RFC-4180 quoting; fields run through MembersCSV.safe_cell/1 to neutralize spreadsheet formula injection (e.g. a custom-field name like =HYPERLINK(...)).