mitgliederverwaltung/docs/test-performance-optimization.md

8.2 KiB

Test Performance Optimization

Last Updated: 2026-01-28
Status: Implemented

This document records the test-suite performance work and — most importantly — the conventions that govern how tests are tagged and run. The seeds-test rationale in §1 is the canonical reference linked from test/seeds_test.exs.

Baseline result: the standard (fast) suite runs ~368 s (~6.1 min) vs. ~445 s before; the full suite (all tests) ~7.4 min. Slow-tagged tests (~25, >1 s each, ~77 s total) are excluded from standard runs and executed via promotion before merge.


1. Seeds Test Suite — coverage mapping

The seeds tests were reduced from 13 to 4. The 9 removed tests were dropped because their assertions are already covered by domain-specific test suites — this mapping is the justification and must be preserved:

Removed seeds test Covered by
"at least one member has no membership fee type assigned" membership_fees/*_test.exs
"each membership fee type has at least one member" membership_fees/*_test.exs
"members with fee types have cycles with various statuses" cycle_generator_test.exs
"creates all 5 authorization roles with correct permission sets" authorization/*_test.exs
"all roles have valid permission_set_names" authorization/permission_sets_test.exs
"does not change role of users who already have a role" merged into general idempotency test
"role creation is idempotent" (detailed) merged into general idempotency test

4 retained critical-bootstrap tests (and why)

These guard deployment-critical invariants that nothing else covers and must stay in the fast suite:

  1. Smoke test — seeds run successfully and create basic data.
  2. Idempotency — seeds can be re-run without duplicating data.
  3. Admin bootstrap — admin user exists with Admin role (critical for initial access).
  4. System-role bootstrapMitglied system role exists (critical for user registration).

If a new critical bootstrap requirement appears, add a test to the "Critical bootstrap invariants" section in test/seeds_test.exs. Removed-test risk is low: a smoke-test failure surfaces broken seeds, and domain tests verify business logic independently of seeds content.


2. Tagging convention: :slow

Tests are split into fast (standard CI) and slow (run via promotion before merge). A test is tagged @tag :slow when all of:

  • execution time > 1 s, and
  • low risk — does not catch critical regressions in core business logic, and
  • it is a UI/display/formatting test, a workflow-detail test, or an edge case with a large dataset (performance tests with 50+ records always qualify).

Never tag as :slow:

  • Core CRUD (Member/User create/update/destroy)
  • Basic authentication/authorization
  • Critical bootstrap (admin user, system roles)
  • Email synchronization
  • Representative policy tests (one per permission set + action)
  • A test that is merely slow due to inefficient setup or a bug — fix the setup/bug instead
  • An integration test — use @tag :integration instead

Use @describetag :slow (not @moduletag) for describe blocks, so unrelated tests in the same module are not tagged.

One-off isolation fix worth noting as a pattern: a test that loaded all members was slow in full runs because of cross-test data accumulation; constraining it with a search query (/members?query=Alice) made it both faster and properly isolated. Prefer query filters over loading all records.


3. Execution model

Mode Command Contents Time
Fast (default) just test-fast / mix test --exclude slow --exclude ui everything except :slow and :ui ~6 min
Slow only just test-slow / mix test --only slow the ~25 :slow tests ~1.3 min
Full just test / mix test all tests ~7.4 min

CI: standard pipeline (check-fast) runs mix test --exclude slow --exclude ui. The full suite (check-full) is triggered by promoting a Drone build to production and is required before merging to main (branch protection).

To find the slowest tests, run mix test --slowest N ad hoc. test/test_helper.exs also carries a slowest: 10 option for ExUnit.start/1, but it is commented out by default — uncomment it to print the 10 slowest tests at the end of every run.


4. Test organization

Tests mirror the lib/ structure:

test/
├── accounts/              # Accounts domain
├── membership/            # Membership domain
├── membership_fees/       # Membership fees domain
├── mv/                    # Core (accounts, membership, authorization)
├── mv_web/                # Web layer (controllers, live, components)
└── support/               # conn_case.ex, data_case.ex

5. Concurrent create_member deadlock and deferrable FKs

A class of intermittent failures (PostgreSQL deadlock_detected, SQLSTATE 40P01) was traced to concurrent create_member transactions, not to any single test. It surfaced as a MatchError on {:ok, member} = ... in member-heavy LiveView tests (e.g. FormMemberSelectionTest) and reproduced only under CPU contention (≈1 in 12 full-fast-suite runs at high async: true concurrency; effectively never on an idle machine).

Root cause. create_member writes a cascade in one transaction (member row, custom_field_values, the user link, fee-type defaulting, cycle generation). Concurrent inserts take FK FOR KEY SHARE (MultiXact) locks on shared parent rows across members / users / membership_fee_types; under contention these can form a cross-transaction lock cycle that Postgres resolves by aborting one transaction. It is a product-level concurrency property, not test-data contention, so it is not fixable by test-state isolation.

Fix. Migration …_make_member_user_fks_deferrable.exs makes the three FKs (users.member_id, users.role_id, members.membership_fee_type_id) DEFERRABLE INITIALLY DEFERRED, moving the FK check (and its lock) to commit time and breaking the cycle. Verified: 0 deadlocks in 15 full-suite runs under maximum CPU contention, versus 1/12 before. This does not weaken integrity — NOT NULL is independent of FK deferral, a real dangling reference still aborts the commit, and ON DELETE RESTRICT (e.g. users.role_id) stays immediate regardless of deferrability. Mv.DeferrableFkTest asserts the constraint state as a regression guard (a deterministic in-process concurrent reproduction is infeasible under the Ecto sandbox, which serializes connections by ownership).

This deadlock is also a latent production risk under concurrent sign-ups; the deferrable-FK fix addresses both.

Async-test-safety checklist (members/groups/custom fields)

Several member-creating test files historically used async: false with a "prevent PostgreSQL deadlocks" comment. With the deferrable-FK migration in place those files are deadlock-safe, but before flipping any such file to async: true:

  • Prove isolation under load, not just one green run. Re-run the file (and the full suite) under varying --seed and CPU contention; a single green run is not evidence (the deadlock and the isolation flakes below are load-dependent).
  • Watch for separate async-isolation issues beyond the deadlock. index_groups_url_params_test.exs and member_filter_component_test.exs showed filtered-member-leak failures (refute html =~ name) under concurrency that are independent of the FK deadlock — these need their own per-file isolation fix before they can run async.

StreamData generator pitfall

FilterTooNarrowError appeared on unlucky seeds (e.g. 222) in a property test that built a value with a reject-filter (StreamData.filter discarding ~1/4 of generated pairs). Under full property-run counts this hits too many consecutive rejections. Fix: construct the desired value directly instead of generating-then-filtering (preserves the exact domain, no rejection). Prefer constructive generators over reject-filters in property tests.


References

  • Testing Standards: CODE_GUIDELINES.md §4
  • CI/CD: .drone.jsonnet
  • Test helper: test/test_helper.exs
  • Just commands: Justfile (test-fast, test-slow, test)