An append-only operation log for syncing game records

When a player saves a game on their phone, edits it on their iPad, and then opens their public profile on the web, all three should agree on what happened. The simplest way to make that true is to sync whole records and let the latest write win. I did not go that route. The Pinball Points sync API pushes and pulls an append-only log of operations, and that one decision quietly headed off a whole category of bugs I might otherwise have spent a long time chasing.

The backend is a small Fastify service in TypeScript sitting in front of PocketBase (SQLite plus auth). The single interesting endpoint is POST /api/sync: a client sends its pending ops, the server applies them and returns any ops it has not seen yet. Everything in this post lives inside that one round trip.

Why last-write-wins loses

Last-write-wins on whole records is appealing because it is simple: each device pushes its copy of a record, the server keeps whichever arrived most recently, and everyone eventually downloads that copy. It works right up until two devices touch the same data while one of them is offline.

Imagine a player corrects a misread score on their phone while their iPad, still holding a stale copy, syncs a few seconds later for an unrelated change. The iPad's whole-record write overwrites the correction, and there is no record that the correction ever existed. The data did not converge; it regressed. Deletes are worse: a delete and an edit racing each other can resurrect a record the player thought was gone.

The root problem is that a whole-record write throws away intent. It says "the record now looks like this" without saying what changed or why. I wanted a sync engine where intent survives the trip, so the server can reason about each change rather than just clobbering state.

Operations, not snapshots

So clients do not push records. They push operations. Each op is a small, self-describing fact: an upsert of these fields on this entity, or a delete of this entity. The client-side shape is validated on arrival with Zod, which doubles as the contract documentation:

const ClientOpSchema = z.object({
  opId:       z.string().uuid(),              // stable, client-generated
  entityType: z.enum(["score", "profile"]),
  entityId:   z.string().min(1).max(200),
  opType:     z.enum(["upsert", "delete"]),
  payload:    PayloadSchema.optional().default(null),
  clientTime: z.string().datetime().max(100),
});

On the server, applying an op is deliberately ordered. For a score upsert it looks like this:

Check the idempotency guard: has this opId already been recorded for this user?
Check for a tombstone; if the score was already deleted, reject the op with entity_deleted rather than resurrecting it.
Assign a monotonic per-user serverSeq and write the op into the append-only ops collection first.
Then apply it to the materialized scores collection (the live data the public profile pages read), bumping a per-record version counter.
If the apply fails, delete the op record so the log never contains an op that was never applied.

Recording the op before applying it is the part that matters most. The log is the source of truth; the materialized collection is just a projection of it. The serverSeq gives every device a single agreed-upon cursor: "I have seen everything up to sequence N, send me what came after." Clients carry that position as an opaque sinceToken, which is nothing more exotic than a base64-wrapped { "seq": N }. Keeping it opaque means I can change the cursor representation later without breaking older clients.

export function encodeSinceToken(seq: number): string {
  return Buffer.from(JSON.stringify({ seq })).toString("base64");
}

Because sequence numbers are monotonic and per-user, an incremental pull is just a range query: serverSeq > sinceSeq, sorted ascending. No diffing, no timestamps to reconcile across clock-skewed devices.

Making retries safe

Networks fail mid-flight. A client sends an op, the connection drops before the response arrives, and the client has no idea whether the server got it. The only safe thing it can do is send it again. That means the server has to treat a duplicate op as a no-op, not a second write.

This is where the client-generated UUID opId earns its keep. Before doing any work, the server looks up whether that opId already exists for the user. If it does, the work is already done and the server simply pushes it onto the acknowledgments list and moves on. Retries become free. The client can stay simple and stubborn, which is exactly what you want from a client draining an outbox on a flaky cafe network.

An op that is safe to apply exactly once is good. An op that is safe to apply any number of times is the one you can comfortably ship.

Serializing per-user writes

Idempotency keeps a single client honest, but it does not stop two of a player's devices from writing at the same instant. Two concurrent requests could both read the current max serverSeq, both decide they are next, and both claim the same number. That would corrupt the one invariant the whole design rests on.

I serialize writes per user with a promise-chain lock. Each user has a chain in an in-memory Map; a new request appends its own gate to the tail and waits for the previous link to resolve, so a single user's ops are applied one at a time in arrival order. Crucially the lock is per user, not global, so one player syncing never blocks another. The acquire carries a 30-second timeout, and on timeout it releases its own gate so a stuck request cannot wedge that user's chain forever.

const previous = userQueues.get(userId) ?? Promise.resolve();
const queued = previous.then(() => gate);
userQueues.set(userId, queued);
await Promise.race([previous, timeout]); // wait our turn, or bail

A lock map that only ever grows is a slow memory leak, and I shipped exactly that bug before catching it. The fix is a sweep every 60 seconds that probes each entry by racing its promise against an instantly-resolved one; if the user's promise has already settled, the entry is dropped. The lock for an idle user disappears and is recreated cheaply the next time that user writes, so the map stays bounded no matter how many players the app has.

Pruning the log without breaking clients

An append-only log grows forever, and most of it is ancient history no device will ever ask for again. A scheduled job runs every six hours and prunes ops older than a seven-day retention window. But pruning is dangerous: if I delete the ops a client still needs and it shows up with an old cursor, an incremental pull would silently skip everything that was trimmed.

So compaction records a watermark: the highest sequence that has been pruned away for that user. On the next sync, the server compares the client's sinceSeq against the watermark:

If the client is caught up past the watermark, it gets the normal incremental range query.
If the client is behind it (a device offline for over a week, or a fresh reinstall), the server abandons the log and builds a snapshot instead, walking the live scores and profiles collections and emitting synthetic upsert and delete ops that rebuild current state from scratch. The response sets snapshot: true so the client knows to treat it as a reset.

The snapshot path is the safety net that lets me prune aggressively. Common case stays cheap; the rare stale client pays a one-time full download and then rejoins the incremental stream.

Trusting the client only so far

An operation log is still a pipe from the client into the database, and it is wise not to let a client write whatever it wants. There are two layers of defense. Zod validates the structural shape and bounds on the way in: a profile slug must match ^[a-z0-9][a-z0-9-]*[a-z0-9]$ and be 3 to 30 characters, allScores is at most six integers, a sync batch is capped at 500 ops. The schemas use .passthrough() on purpose, because the second layer is the real gate: every payload runs through an explicit field allowlist right before it touches the database.

const SCORE_FIELDS = new Set([
  "machineName", "allScores", "playerCount", "playerScore",
  "playedAt", "latitude", "longitude", "venueName", /* ... */
]);

function pickAllowed(payload, allowed) {
  return Object.fromEntries(
    Object.entries(payload ?? {}).filter(([k]) => allowed.has(k))
  );
}

Only fields the schema expects survive; anything else is dropped on the floor. That stops a buggy or misbehaving client from setting fields it has no business touching, like the server-owned version or user, and it keeps the materialized data clean so the server-rendered public profiles never have to defend against junk that snuck in through sync.

What I took away

Modeling sync as an append-only log was more work upfront than copying whole records, and I think it earned every bit of that effort.

Logging the intent, not the snapshot, is what lets edits and deletes converge instead of overwrite each other.
Write the op before you apply it, and roll it back if the apply fails, so the log can never misrepresent what happened.
A monotonic per-user serverSeq wrapped in an opaque token turns incremental pull into a plain range query.
Stable UUID op IDs plus a processed-op check turn retries into a non-event, which lets the client stay dumb and persistent.
A per-user serialized lock protects that sequence; a periodic sweep keeps the lock map from leaking memory (a bug I had to learn from).
Compaction with a watermark lets you prune the log aggressively, as long as you keep a snapshot path for the client that fell too far behind.
Validate structure with a schema, but gate the actual writes with an allowlist; treat the client as an untrusted source, because it is one.

The payoff is the quiet kind I find satisfying: a player edits a score on one device, and every other device and the website simply agree. No resurrected records, no lost corrections, no clever conflict UI to design because the conflicts mostly stopped happening.

An append-only operation log for syncing game records.