fix existing flakiness + cut runtime closes #533 #544
1 changed files with 23 additions and 0 deletions
|
|
@ -90,6 +90,29 @@ test/
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
## 5. Concurrent `create_member` deadlock and deferrable FKs
|
||||||
|
|
||||||
|
A class of intermittent failures (PostgreSQL `deadlock_detected`, SQLSTATE `40P01`) was traced to **concurrent `create_member` transactions**, not to any single test. It surfaced as a `MatchError` on `{:ok, member} = ...` in member-heavy LiveView tests (e.g. `FormMemberSelectionTest`) and reproduced only under CPU contention (≈1 in 12 full-fast-suite runs at high `async: true` concurrency; effectively never on an idle machine).
|
||||||
|
|
||||||
|
**Root cause.** `create_member` writes a cascade in one transaction (member row, `custom_field_values`, the `user` link, fee-type defaulting, cycle generation). Concurrent inserts take FK `FOR KEY SHARE` (MultiXact) locks on shared parent rows across `members` / `users` / `membership_fee_types`; under contention these can form a cross-transaction lock cycle that Postgres resolves by aborting one transaction. It is a product-level concurrency property, **not** test-data contention, so it is not fixable by test-state isolation.
|
||||||
|
|
||||||
|
**Fix.** Migration `…_make_member_user_fks_deferrable.exs` makes the three FKs (`users.member_id`, `users.role_id`, `members.membership_fee_type_id`) `DEFERRABLE INITIALLY DEFERRED`, moving the FK check (and its lock) to commit time and breaking the cycle. Verified: **0 deadlocks in 15 full-suite runs under maximum CPU contention**, versus 1/12 before. This does **not** weaken integrity — `NOT NULL` is independent of FK deferral, a real dangling reference still aborts the commit, and `ON DELETE RESTRICT` (e.g. `users.role_id`) stays immediate regardless of deferrability. `Mv.DeferrableFkTest` asserts the constraint state as a regression guard (a deterministic in-process concurrent reproduction is infeasible under the Ecto sandbox, which serializes connections by ownership).
|
||||||
|
|
||||||
|
This deadlock is also a latent **production** risk under concurrent sign-ups; the deferrable-FK fix addresses both.
|
||||||
|
|
||||||
|
### Async-test-safety checklist (members/groups/custom fields)
|
||||||
|
|
||||||
|
Several member-creating test files historically used `async: false` with a "prevent PostgreSQL deadlocks" comment. With the deferrable-FK migration in place those files are deadlock-safe, but before flipping any such file to `async: true`:
|
||||||
|
|
||||||
|
- **Prove isolation under load, not just one green run.** Re-run the file (and the full suite) under varying `--seed` **and** CPU contention; a single green run is not evidence (the deadlock and the isolation flakes below are load-dependent).
|
||||||
|
- **Watch for separate async-isolation issues beyond the deadlock.** `index_groups_url_params_test.exs` and `member_filter_component_test.exs` showed filtered-member-leak failures (`refute html =~ name`) under concurrency that are independent of the FK deadlock — these need their own per-file isolation fix before they can run async.
|
||||||
|
|
||||||
|
### StreamData generator pitfall
|
||||||
|
|
||||||
|
`FilterTooNarrowError` appeared on unlucky seeds (e.g. 222) in a property test that built a value with a reject-filter (`StreamData.filter` discarding ~1/4 of generated pairs). Under full property-run counts this hits too many consecutive rejections. Fix: **construct the desired value directly** instead of generating-then-filtering (preserves the exact domain, no rejection). Prefer constructive generators over reject-filters in property tests.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## References
|
## References
|
||||||
|
|
||||||
- Testing Standards: `CODE_GUIDELINES.md` §4
|
- Testing Standards: `CODE_GUIDELINES.md` §4
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue