Merge pull request 'Implements error capping' (#356) from feature/334_Ash_persistence into main

Reviewed-on: #356
2026-01-19 13:14:59 +01:00 · 2026-01-19 13:14:59 +01:00 · fef2cce283
commit fef2cce283
parent c31392e4fe ac0e272cca
4 changed files with 280 additions and 69 deletions
--- a/docs/csv-member-import-v1.md
+++ b/docs/csv-member-import-v1.md
@ -2,10 +2,29 @@

 **Version:** 1.0  
 **Date:** 2025-01-XX  
-**Status:** Ready for Implementation  
+**Status:** In Progress (Backend Complete, UI Pending)  
 **Related Documents:**
 - [Feature Roadmap](./feature-roadmap.md) - Overall feature planning

+## Implementation Status
+
+**Completed Issues:**
+- ✅ Issue #1: CSV Specification & Static Template Files
+- ✅ Issue #2: Import Service Module Skeleton
+- ✅ Issue #3: CSV Parsing + Delimiter Auto-Detection + BOM Handling
+- ✅ Issue #4: Header Normalization + Per-Header Mapping
+- ✅ Issue #5: Validation (Required Fields) + Error Formatting
+- ✅ Issue #6: Persistence via Ash Create + Per-Row Error Capture (with Error-Capping)
+- ✅ Issue #11: Custom Field Import (Backend)
+
+**In Progress / Pending:**
+- ⏳ Issue #7: Admin Global Settings LiveView UI (Upload + Start Import + Results)
+- ⏳ Issue #8: Authorization + Limits
+- ⏳ Issue #9: End-to-End LiveView Tests + Fixtures
+- ⏳ Issue #10: Documentation Polish
+
+**Latest Update:** Error-Capping in `process_chunk/4` implemented (2025-01-XX)
+
 ---

 ## Table of Contents
@ -332,19 +351,24 @@ Use `Mv.Authorization.PermissionSets` (preferred) instead of hard-coded string c

 **Dependencies:** None

+**Status:** ✅ **COMPLETED**
+
 **Goal:** Define CSV contract and add static templates.

 **Tasks:**
- [ ] Finalize header mapping variants
- [ ] Document normalization rules
- [ ] Document delimiter detection strategy
- [ ] Create templates in `priv/static/templates/` (UTF-8 with BOM)
- [ ] Document template URLs and how to link them from LiveView
- [ ] Document line number semantics (physical CSV line numbers)
+- [x] Finalize header mapping variants
+- [x] Document normalization rules
+- [x] Document delimiter detection strategy
+- [x] Create templates in `priv/static/templates/` (UTF-8 with BOM)
+  - `member_import_en.csv` with English headers
+  - `member_import_de.csv` with German headers
+- [x] Document template URLs and how to link them from LiveView
+- [x] Document line number semantics (physical CSV line numbers)
+- [x] Templates included in `MvWeb.static_paths()` configuration

 **Definition of Done:**
- [ ] Templates open cleanly in Excel/LibreOffice
- [ ] CSV spec section complete
+- [x] Templates open cleanly in Excel/LibreOffice
+- [x] CSV spec section complete

 ---

@ -352,18 +376,20 @@ Use `Mv.Authorization.PermissionSets` (preferred) instead of hard-coded string c

 **Dependencies:** None

+**Status:** ✅ **COMPLETED**
+
 **Goal:** Create service API and error types.

 **API (recommended):**
 - `prepare/2` — parse + map + limit checks, returns import_state
- `process_chunk/3` — process one chunk (pure-ish), returns per-chunk results
+- `process_chunk/4` — process one chunk (pure-ish), returns per-chunk results

 **Tasks:**
- [ ] Create `lib/mv/membership/import/member_csv.ex`
- [ ] Define public function: `prepare/2 (file_content, opts \\ [])`
- [ ] Define public function: `process_chunk/3 (chunk_rows_with_lines, column_map, opts \\ [])`
- [ ] Define error struct: `%MemberCSV.Error{csv_line_number: integer, field: atom | nil, message: String.t}`
- [ ] Document module + API
+- [x] Create `lib/mv/membership/import/member_csv.ex`
+- [x] Define public function: `prepare/2 (file_content, opts \\ [])`
+- [x] Define public function: `process_chunk/4 (chunk_rows_with_lines, column_map, custom_field_map, opts \\ [])`
+- [x] Define error struct: `%MemberCSV.Error{csv_line_number: integer, field: atom | nil, message: String.t}`
+- [x] Document module + API

 ---

@ -371,24 +397,26 @@ Use `Mv.Authorization.PermissionSets` (preferred) instead of hard-coded string c

 **Dependencies:** Issue #2

+**Status:** ✅ **COMPLETED**
+
 **Goal:** Parse CSV robustly with correct delimiter detection and BOM handling.

 **Tasks:**
- [ ] Verify/add NimbleCSV dependency (`{:nimble_csv, "~> 1.0"}`)
- [ ] Create `lib/mv/membership/import/csv_parser.ex`
- [ ] Implement `strip_bom/1` and apply it **before** any header handling
- [ ] Handle `\r\n` and `\n` line endings (trim `\r` on header record)
- [ ] Detect delimiter via header recognition (try `;` and `,`)
- [ ] Parse CSV and return:
+- [x] Verify/add NimbleCSV dependency (`{:nimble_csv, "~> 1.0"}`)
+- [x] Create `lib/mv/membership/import/csv_parser.ex`
+- [x] Implement `strip_bom/1` and apply it **before** any header handling
+- [x] Handle `\r\n` and `\n` line endings (trim `\r` on header record)
+- [x] Detect delimiter via header recognition (try `;` and `,`)
+- [x] Parse CSV and return:
  - `headers :: [String.t()]`
-  - `rows :: [{csv_line_number, [String.t()]}]` or directly `[{csv_line_number, row_map}]`
- [ ] Skip completely empty records (but preserve correct physical line numbers)
- [ ] Return `{:ok, headers, rows}` or `{:error, reason}`
+  - `rows :: [{csv_line_number, [String.t()]}]` with correct physical line numbers
+- [x] Skip completely empty records (but preserve correct physical line numbers)
+- [x] Return `{:ok, headers, rows}` or `{:error, reason}`

 **Definition of Done:**
- [ ] BOM handling works (Excel exports)
- [ ] Delimiter detection works reliably
- [ ] Rows carry correct `csv_line_number`
+- [x] BOM handling works (Excel exports)
+- [x] Delimiter detection works reliably
+- [x] Rows carry correct `csv_line_number`

 ---

@ -396,20 +424,22 @@ Use `Mv.Authorization.PermissionSets` (preferred) instead of hard-coded string c

 **Dependencies:** Issue #3

+**Status:** ✅ **COMPLETED**
+
 **Goal:** Map each header individually to canonical fields (normalized comparison).

 **Tasks:**
- [ ] Create `lib/mv/membership/import/header_mapper.ex`
- [ ] Implement `normalize_header/1`
- [ ] Normalize mapping variants once and compare normalized strings
- [ ] Build `column_map` (canonical field -> column index)
- [ ] **Early abort if required headers missing** (`email`)
- [ ] Ignore unknown columns (member fields only)
- [ ] **Separate custom field column detection** (by name, with normalization)
+- [x] Create `lib/mv/membership/import/header_mapper.ex`
+- [x] Implement `normalize_header/1`
+- [x] Normalize mapping variants once and compare normalized strings
+- [x] Build `column_map` (canonical field -> column index)
+- [x] **Early abort if required headers missing** (`email`)
+- [x] Ignore unknown columns (member fields only)
+- [x] **Separate custom field column detection** (by name, with normalization)

 **Definition of Done:**
- [ ] English/German headers map correctly
- [ ] Missing required columns fails fast
+- [x] English/German headers map correctly
+- [x] Missing required columns fails fast

 ---

@ -417,14 +447,16 @@ Use `Mv.Authorization.PermissionSets` (preferred) instead of hard-coded string c

 **Dependencies:** Issue #4

+**Status:** ✅ **COMPLETED**
+
 **Goal:** Validate each row and return structured, translatable errors.

 **Tasks:**
- [ ] Implement `validate_row/3 (row_map, csv_line_number, opts)`
- [ ] Required field presence (`email`)
- [ ] Email format validation (EctoCommons.EmailValidator)
- [ ] Trim values before validation
- [ ] Gettext-backed error messages
+- [x] Implement `validate_row/3 (row_map, csv_line_number, opts)`
+- [x] Required field presence (`email`)
+- [x] Email format validation (EctoCommons.EmailValidator)
+- [x] Trim values before validation
+- [x] Gettext-backed error messages

 ---

@ -432,21 +464,32 @@ Use `Mv.Authorization.PermissionSets` (preferred) instead of hard-coded string c

 **Dependencies:** Issue #5

+**Status:** ✅ **COMPLETED**
+
 **Goal:** Create members and capture errors per row with correct CSV line numbers.

 **Tasks:**
- [ ] Implement `process_chunk/3` in service:
+- [x] Implement `process_chunk/4` in service:
  - Input: `[{csv_line_number, row_map}]`
  - Validate + create sequentially
  - Collect counts + first 50 errors (per import overall; LiveView enforces cap across chunks)
- [ ] Implement Ash error formatter helper:
+  - **Error-Capping:** Supports `existing_error_count` and `max_errors` in opts (default: 50)
+  - **Error-Capping:** Only collects errors if under limit, but continues processing all rows
+  - **Error-Capping:** `failed` count is always accurate, even when errors are capped
+- [x] Implement Ash error formatter helper:
  - Convert `Ash.Error.Invalid` into `%MemberCSV.Error{}`
  - Prefer field-level errors where possible (attach `field` atom)
  - Handle unique email constraint error as user-friendly message
- [ ] Map row_map to Ash attrs (`%{first_name: ..., ...}`)
+- [x] Map row_map to Ash attrs (`%{first_name: ..., ...}`)
+- [x] Custom field value processing and creation

 **Important:** **Do not recompute line numbers** in this layer—use the ones provided by the parser.

+**Implementation Notes:**
+- `process_chunk/4` accepts `opts` with `existing_error_count` and `max_errors` for error capping across chunks
+- Error capping respects the limit per import overall (not per chunk)
+- Processing continues even after error limit is reached (for accurate counts)
+
 ---

 ### Issue #7: Admin Global Settings LiveView UI (Upload + Start Import + Results + Template Links)
@ -546,6 +589,8 @@ Use `Mv.Authorization.PermissionSets` (preferred) instead of hard-coded string c

 **Priority:** High (Core v1 Feature)

+**Status:** ✅ **COMPLETED** (Backend Implementation)
+
 **Goal:** Support importing custom field values from CSV columns. Custom fields should exist in Mila before import for best results.

 **Important Requirements:**
@ -555,27 +600,32 @@ Use `Mv.Authorization.PermissionSets` (preferred) instead of hard-coded string c
 - Unknown custom field columns (non-existent names) will be ignored with a warning - import continues

 **Tasks:**
- [ ] Extend `header_mapper.ex` to detect custom field columns by name (using same normalization as member fields)
- [ ] Query existing custom fields during `prepare/2` to map custom field columns
- [ ] Collect unknown custom field columns and add warning messages (don't fail import)
- [ ] Map custom field CSV values to `CustomFieldValue` creation in `process_chunk/3`
- [ ] Handle custom field type validation (string, integer, boolean, date, email)
- [ ] Create `CustomFieldValue` records linked to members during import
- [ ] Update error messages to include custom field validation errors
- [ ] Add UI help text explaining custom field requirements:
+- [x] Extend `header_mapper.ex` to detect custom field columns by name (using same normalization as member fields)
+- [x] Query existing custom fields during `prepare/2` to map custom field columns
+- [x] Collect unknown custom field columns and add warning messages (don't fail import)
+- [x] Map custom field CSV values to `CustomFieldValue` creation in `process_chunk/4`
+- [x] Handle custom field type validation (string, integer, boolean, date, email)
+- [x] Create `CustomFieldValue` records linked to members during import
+- [ ] Update error messages to include custom field validation errors (if needed)
+- [ ] Add UI help text explaining custom field requirements (pending Issue #7):
  - "Custom fields must be created in Mila before importing"
  - "Use the custom field name as the CSV column header (same normalization as member fields)"
  - Link to custom fields management section
- [ ] Update CSV templates documentation to explain custom field columns
- [ ] Add tests for custom field import (valid, invalid name, type validation, warning for unknown)
+- [ ] Update CSV templates documentation to explain custom field columns (pending Issue #1)
+- [x] Add tests for custom field import (valid, invalid name, type validation, warning for unknown)

 **Definition of Done:**
- [ ] Custom field columns are recognized by name (with normalization)
- [ ] Warning messages shown for unknown custom field columns (import continues)
- [ ] Custom field values are created and linked to members
- [ ] Type validation works for all custom field types
- [ ] UI clearly explains custom field requirements
- [ ] Tests cover custom field import scenarios (including warning for unknown names)
+- [x] Custom field columns are recognized by name (with normalization)
+- [x] Warning messages shown for unknown custom field columns (import continues)
+- [x] Custom field values are created and linked to members
+- [x] Type validation works for all custom field types
+- [ ] UI clearly explains custom field requirements (pending Issue #7)
+- [x] Tests cover custom field import scenarios (including warning for unknown names)
+
+**Implementation Notes:**
+- Custom field lookup is built in `prepare/2` and passed via `custom_field_lookup` in opts
+- Custom field values are formatted according to type in `format_custom_field_value/2`
+- Unknown custom field columns generate warnings in `import_state.warnings`

 ---

--- a/lib/mv/membership/import/member_csv.ex
+++ b/lib/mv/membership/import/member_csv.ex
@ -70,7 +70,8 @@ defmodule Mv.Membership.Import.MemberCSV do
  @type chunk_result :: %{
          inserted: non_neg_integer(),
          failed: non_neg_integer(),
-          errors: list(Error.t())
+          errors: list(Error.t()),
+          errors_truncated?: boolean()
        }

  alias Mv.Membership.Import.CsvParser
@ -258,7 +259,18 @@ defmodule Mv.Membership.Import.MemberCSV do
    - `row_map` - Map with `:member` and `:custom` keys containing field values
  - `column_map` - Map of canonical field names (atoms) to column indices (for reference)
  - `custom_field_map` - Map of custom field IDs (strings) to column indices (for reference)
-  - `opts` - Optional keyword list for processing options
+  - `opts` - Optional keyword list for processing options:
+    - `:custom_field_lookup` - Map of custom field IDs to metadata (default: `%{}`)
+    - `:existing_error_count` - Number of errors already collected in previous chunks (default: `0`)
+    - `:max_errors` - Maximum number of errors to collect per import overall (default: `50`)
+
+  ## Error Capping
+
+  Errors are capped at `max_errors` per import overall. When the limit is reached:
+  - No additional errors are collected in the `errors` list
+  - Processing continues for all rows
+  - The `failed` count continues to increment correctly for all failed rows
+  - The `errors_truncated?` flag is set to `true` to indicate that additional errors were suppressed

  ## Returns

@ -272,6 +284,11 @@ defmodule Mv.Membership.Import.MemberCSV do
      iex> custom_field_map = %{}
      iex> MemberCSV.process_chunk(chunk, column_map, custom_field_map)
      {:ok, %{inserted: 1, failed: 0, errors: []}}
+
+      iex> chunk = [{2, %{member: %{email: "invalid"}, custom: %{}}}]
+      iex> opts = [existing_error_count: 25, max_errors: 50]
+      iex> MemberCSV.process_chunk(chunk, %{}, %{}, opts)
+      {:ok, %{inserted: 0, failed: 1, errors: [%Error{}], errors_truncated?: false}}
  """
  @spec process_chunk(
          list({pos_integer(), map()}),
@ -281,20 +298,34 @@ defmodule Mv.Membership.Import.MemberCSV do
        ) :: {:ok, chunk_result()} | {:error, String.t()}
  def process_chunk(chunk_rows_with_lines, _column_map, _custom_field_map, opts \\ []) do
    custom_field_lookup = Keyword.get(opts, :custom_field_lookup, %{})
+    existing_error_count = Keyword.get(opts, :existing_error_count, 0)
+    max_errors = Keyword.get(opts, :max_errors, 50)
+
+    {inserted, failed, errors, _collected_error_count, truncated?} =
+      Enum.reduce(chunk_rows_with_lines, {0, 0, [], 0, false}, fn {line_number, row_map},
+                                                                     {acc_inserted, acc_failed, acc_errors, acc_error_count, acc_truncated?} ->
+        current_error_count = existing_error_count + acc_error_count

-    {inserted, failed, errors} =
-      Enum.reduce(chunk_rows_with_lines, {0, 0, []}, fn {line_number, row_map},
-                                                        {acc_inserted, acc_failed, acc_errors} ->
        case process_row(row_map, line_number, custom_field_lookup) do
          {:ok, _member} ->
-            {acc_inserted + 1, acc_failed, acc_errors}
+            {acc_inserted + 1, acc_failed, acc_errors, acc_error_count, acc_truncated?}

          {:error, error} ->
-            {acc_inserted, acc_failed + 1, [error | acc_errors]}
+            new_acc_failed = acc_failed + 1
+
+            # Only collect errors if under limit
+            {new_acc_errors, new_error_count, new_truncated?} =
+              if current_error_count < max_errors do
+                {[error | acc_errors], acc_error_count + 1, acc_truncated?}
+              else
+                {acc_errors, acc_error_count, true}
+              end
+
+            {acc_inserted, new_acc_failed, new_acc_errors, new_error_count, new_truncated?}
        end
      end)

-    {:ok, %{inserted: inserted, failed: failed, errors: Enum.reverse(errors)}}
+    {:ok, %{inserted: inserted, failed: failed, errors: Enum.reverse(errors), errors_truncated?: truncated?}}
  end

  @doc """
--- a/test/mv/membership/import/member_csv_test.exs
+++ b/test/mv/membership/import/member_csv_test.exs
@ -325,6 +325,136 @@ defmodule Mv.Membership.Import.MemberCSVTest do
      # Check that @doc exists by reading the module
      assert function_exported?(MemberCSV, :process_chunk, 4)
    end
+
+    test "error capping collects exactly 50 errors" do
+      # Create 50 rows with invalid emails
+      chunk_rows_with_lines =
+        1..50
+        |> Enum.map(fn i ->
+          {i + 1, %{member: %{email: "invalid-email-#{i}"}, custom: %{}}}
+        end)
+
+      column_map = %{email: 0}
+      custom_field_map = %{}
+      opts = [existing_error_count: 0, max_errors: 50]
+
+      assert {:ok, chunk_result} =
+               MemberCSV.process_chunk(chunk_rows_with_lines, column_map, custom_field_map, opts)
+
+      assert chunk_result.inserted == 0
+      assert chunk_result.failed == 50
+      assert length(chunk_result.errors) == 50
+    end
+
+    test "error capping collects only first 50 errors when more than 50 errors occur" do
+      # Create 60 rows with invalid emails
+      chunk_rows_with_lines =
+        1..60
+        |> Enum.map(fn i ->
+          {i + 1, %{member: %{email: "invalid-email-#{i}"}, custom: %{}}}
+        end)
+
+      column_map = %{email: 0}
+      custom_field_map = %{}
+      opts = [existing_error_count: 0, max_errors: 50]
+
+      assert {:ok, chunk_result} =
+               MemberCSV.process_chunk(chunk_rows_with_lines, column_map, custom_field_map, opts)
+
+      assert chunk_result.inserted == 0
+      assert chunk_result.failed == 60
+      assert length(chunk_result.errors) == 50
+    end
+
+    test "error capping respects existing_error_count" do
+      # Create 30 rows with invalid emails
+      chunk_rows_with_lines =
+        1..30
+        |> Enum.map(fn i ->
+          {i + 1, %{member: %{email: "invalid-email-#{i}"}, custom: %{}}}
+        end)
+
+      column_map = %{email: 0}
+      custom_field_map = %{}
+      opts = [existing_error_count: 25, max_errors: 50]
+
+      assert {:ok, chunk_result} =
+               MemberCSV.process_chunk(chunk_rows_with_lines, column_map, custom_field_map, opts)
+
+      assert chunk_result.inserted == 0
+      assert chunk_result.failed == 30
+      # Should only collect 25 errors (25 existing + 25 new = 50 limit)
+      assert length(chunk_result.errors) == 25
+    end
+
+    test "error capping collects no errors when limit already reached" do
+      # Create 10 rows with invalid emails
+      chunk_rows_with_lines =
+        1..10
+        |> Enum.map(fn i ->
+          {i + 1, %{member: %{email: "invalid-email-#{i}"}, custom: %{}}}
+        end)
+
+      column_map = %{email: 0}
+      custom_field_map = %{}
+      opts = [existing_error_count: 50, max_errors: 50]
+
+      assert {:ok, chunk_result} =
+               MemberCSV.process_chunk(chunk_rows_with_lines, column_map, custom_field_map, opts)
+
+      assert chunk_result.inserted == 0
+      assert chunk_result.failed == 10
+      assert length(chunk_result.errors) == 0
+    end
+
+    test "error capping with mixed success and failure" do
+      # Create 100 rows: 30 valid, 70 invalid
+      valid_rows =
+        1..30
+        |> Enum.map(fn i ->
+          {i + 1, %{member: %{email: "valid#{i}@example.com"}, custom: %{}}}
+        end)
+
+      invalid_rows =
+        31..100
+        |> Enum.map(fn i ->
+          {i + 1, %{member: %{email: "invalid-email-#{i}"}, custom: %{}}}
+        end)
+
+      chunk_rows_with_lines = valid_rows ++ invalid_rows
+
+      column_map = %{email: 0}
+      custom_field_map = %{}
+      opts = [existing_error_count: 0, max_errors: 50]
+
+      assert {:ok, chunk_result} =
+               MemberCSV.process_chunk(chunk_rows_with_lines, column_map, custom_field_map, opts)
+
+      assert chunk_result.inserted == 30
+      assert chunk_result.failed == 70
+      # Should only collect 50 errors (limit reached)
+      assert length(chunk_result.errors) == 50
+    end
+
+    test "error capping with custom max_errors" do
+      # Create 20 rows with invalid emails
+      chunk_rows_with_lines =
+        1..20
+        |> Enum.map(fn i ->
+          {i + 1, %{member: %{email: "invalid-email-#{i}"}, custom: %{}}}
+        end)
+
+      column_map = %{email: 0}
+      custom_field_map = %{}
+      opts = [existing_error_count: 0, max_errors: 10]
+
+      assert {:ok, chunk_result} =
+               MemberCSV.process_chunk(chunk_rows_with_lines, column_map, custom_field_map, opts)
+
+      assert chunk_result.inserted == 0
+      assert chunk_result.failed == 20
+      assert length(chunk_result.errors) == 10
+    end
  end

  describe "validate_row/3" do