Race conditions are a familiar problem. Most of us have encountered them, or at least know we should worry about them. The standard defense is to think carefully. Review the code. Reason about what could happen if two requests arrive at the same time. Add a lock if it feels dangerous.
The problem with "think carefully" is that it fails silently. You can't tell the difference between code that has no race conditions and code where you just haven't thought of the right interleaving yet. Any system where success depends on you always thinking hard enough has already failed. You just don't know it yet.
We do a lot of agentic coding at FRAGMENT. AI agents are remarkably good at writing code, but they have a fundamental limitation: they can't hold an entire codebase in context. An agent working on file A doesn't know that its changes affect file B. It doesn't remember what it worked on last week.
This isn't a criticism of agents—humans have the same problem, just slower. But agents make more changes faster, which means more opportunities for subtle bugs to slip in.
Consider what happens when an agent adds a feature. It reads the relevant files, makes changes that look correct in isolation, and moves on. But those changes might introduce a race condition in code the agent never looked at. The code passes review because it looks right. The tests pass because they don't exercise the concurrent case. The bug waits in production for the wrong interleaving.
We wanted tests that would catch these bugs whether or not anyone was thinking about them. Tests that explore interleavings systematically rather than hoping to stumble on the bad one. Property-based testing with a scheduler gives us exactly that.
Before diving into the technique, it's worth naming the patterns. Race conditions come in a few recognizable shapes:
Write-after-read (lost update): Two processes read a value, compute a new value, and write it back. One write overwrites the other. The classic example is incrementing a counter—both processes read 5, both write 6, and an increment is lost.
Check-then-act (TOCTOU): Check if a condition is true, then act on it. But the condition changed between check and act. Checking inventory before reserving. Checking permissions before accessing. The check passes, but by the time you act, reality has shifted.
Stale cache write: A slow reader fetches from the database, a writer updates the database and invalidates the cache, then the slow reader writes stale data to the cache. The invalidation happened before the stale write.
Silent overwrite: Two operations both "succeed" and return success to their clients, but one overwrote the other. The client was told their operation worked, but it was actually lost. No error, no indication—just a lie.
Each of these can emerge during agentic coding. An agent adds caching without realizing the read path now races with writes. An agent parallelizes requests for performance without seeing the shared state they modify. An agent implements optimistic locking but misses an edge case in the retry logic.
The code looks correct. It is correct, for sequential requests. The bug only appears when requests interleave in just the wrong way.
Property-based testing normally means: generate random inputs, check that some property holds. For race condition testing, the "random input" is the ordering of async operations.
The fast-check library has a scheduler that controls when promises resolve. Instead of letting the JavaScript runtime decide the order, you yield that control to the scheduler. It tries different orderings, looking for one that breaks your test.
When it finds a failure, it gives you the exact sequence of operations that caused the problem. You can see precisely which interleaving broke your invariant.
The key is wrapping your database operations so they yield to the scheduler. Each operation gets a label, and when the test fails, you see exactly which interleaving broke your property. This is a minimal reproducer—you can trace through the sequence and understand exactly what happened.
Let me walk through one example in detail. I chose this one because the code looks correct—it uses all the right patterns—and yet still fails.
Idempotency keys prevent duplicate operations. The client sends a unique key with each request. If the server has seen that key before, it returns the cached result. If not, it processes the request and caches the result for future replays.
Here's an implementation that uses a retry loop with conditional writes:
const processPayment = async (db, idempotencyKey, payload) => {
const key = `idempotency:${idempotencyKey}`;
for (let attempt = 0; attempt < MAX_RETRIES; attempt++) {
// Check if we've already processed this key
const existing = await db.get(key);
if (existing) {
if (deepEqual(existing.payload, payload)) {
return { ...existing.result, wasReplay: true };
}
throw new Error('Idempotency key reused with different payload');
}
// Try to claim this key atomically
const claimed = await db.conditionalSet(key, { payload });
if (claimed) {
// We won - process the payment
const result = await chargeCard(payload);
await db.set(key, { payload, result });
return result;
}
// Lost the race - retry
}
throw new Error('Failed after retries');
};
This code looks solid. We read first to handle the common case. We use a conditional write to claim the key atomically. If we lose the race, we retry—looping back to re-read and check if it's now a replay. An agent might write exactly this—it's following all the best practices.
But there's a bug. The db.get call has a cache (as many ORMs and database clients do), and we forgot to invalidate it between retries. On each retry, the cache returns the stale value from attempt 0: undefined. The loop keeps trying to claim a key that's already been claimed, never recognizing it's a replay.
await fc.assert(
fc.asyncProperty(fc.scheduler(), async (s) => {
const db1 = cachedDB(db, s, 'request1');
const db2 = cachedDB(db, s, 'request2');
// Same key, same payload - should be a replay
const payload = { amount: 100, card: 'tok_123' };
const p1 = processPayment(db1, 'key-123', payload);
const p2 = processPayment(db2, 'key-123', payload);
await s.waitAll();
const results = await Promise.allSettled([p1, p2]);
// Property: identical payloads should never throw
const errors = results.filter(r => r.status === 'rejected');
expect(errors).toHaveLength(0);
})
);
The test fails on the first run. Here's the counterexample:
Counterexample: [schedulerFor()
-> [task#2] promise::request2.checkExisting(attempt=0) resolved
-> [task#1] promise::request1.checkExisting(attempt=0) resolved
-> [task#3] promise::request2.conditionalSet(attempt=0) resolved
-> [task#4] promise::request1.conditionalSet(attempt=0) resolved
-> [task#6] promise::request1.checkExisting(attempt=1) [CACHE HIT] resolved
-> [task#7] promise::request1.conditionalSet(attempt=1) resolved
-> [task#8] promise::request1.checkExisting(attempt=2) [CACHE HIT] resolved
-> [task#5] promise::request2.saveResult resolved
-> [task#9] promise::request1.conditionalSet(attempt=2) resolved]
Both requests read on attempt 0—both cache undefined. Request 2 wins the conditional write. Request 1 loses and retries, but look at attempt 1 and 2: [CACHE HIT]. Each retry reads from the stale cache, gets undefined, and tries to claim the key again. After exhausting retries, it throws "Failed after retries" for what should have been a simple replay.
Here's the thing: this code used to work. The original implementation had no cache. The re-read after losing the race hit the database and got the correct value. The tests passed. It shipped.
Then, last month, an agent was asked to improve API latency. It noticed repeated database reads and added a per-request cache. Reasonable optimization. The agent didn't touch the idempotency code—it just wrapped the database client. The change looked safe. The existing tests still passed because they don't exercise concurrent requests.
Now the idempotency code re-reads from a cache that's lying to it. The bug wasn't in the original code or in the caching code. It emerged from their interaction, in an interleaving that neither author considered.
The race condition test catches it by exploring that interleaving. The property is simple: identical payloads should never throw a "different payload" error. The scheduler finds a counterexample.
Invalidate the cache before each retry:
for (let attempt = 0; attempt < MAX_RETRIES; attempt++) {
// Invalidate cache on retry
if (attempt > 0) db.invalidate(key);
const existing = await db.get(key);
// ... rest of loop
}
The fix is small, but you have to know the cache exists. The bug only appears when you lose a race and the cache lies about what's in the database.
The idempotency race is instructive, but you shouldn't have to solve it yourself.
At FRAGMENT, idempotency is built into the API. When you post a Ledger Entry, you provide an ik (idempotency key). If you post the same ik twice—whether due to a retry, a network hiccup, or a client bug—the second request returns the result of the first. Guaranteed, regardless of timing.
This isn't implemented with check-then-act. The idempotency check and the ledger write happen in a single atomic transaction. There's no window where two requests can both pass the check. The race condition we demonstrated above is impossible.
The same principle applies throughout the API:
This matters because it removes entire categories of bugs from your application code. You don't need to implement atomic idempotency checks, worry about conditional writes, or reason about transaction isolation levels. You call the API with an idempotency key and the problem is solved.
We still use property-based race condition testing internally—that's how we verify these guarantees hold under every interleaving we can find. But for FRAGMENT users, the hard races are eliminated at the API level. You can focus on your product logic instead of distributed systems edge cases.
A few things we've learned:
Labels matter. When a test fails, you're reading a sequence of operations. Labels like request1.checkIdempotencyKey make the failure obvious. Labels like actor1.get make you squint.
Few runs is usually enough. The scheduler finds bad interleavings quickly. If there's a race condition, it usually shows up in the first few runs. We often use numRuns: 100 but even 10 is enough to catch most issues.
Wrap at the database layer. The cleanest approach is wrapping your database calls so they yield to the scheduler. Your business logic doesn't need to know it's being tested for races.
The full runnable examples from this post are available on GitHub.
The defense of your code against race conditions shouldn't be "I thought about it carefully." The defense should be a test that explores interleavings and finds counterexamples.
Thinking carefully is still valuable. But it's not evidence. The evidence is a test that passes—a test that actually exercises the concurrent behavior you're worried about (or proof that you're correct with a system like TLA+ or pen and paper mathematics).
We're not wearing lab coats, but we're still doing science. The scientific method makes a pretty good guide.