Fixing question data without an App Store release

A wrong answer key or a typo in a question used to mean a full App Store submission and days of review just to fix a single line of data. For a placement test that spans dozens of languages, that adds up quickly. So I tried not shipping questions the same way I ship code, and that one decision ended up shaping how the whole product is built, deployed, and even how it feels to look for mistakes.

Two ship channels, not one

The core idea is simple to say: app code and test content move through different pipelines at different speeds. App code goes through App Store review, as it should. The actual test questions do not. They live as JSON blueprints served from a content API, and they update on their own schedule, independent of any binary already on a phone.

This split matters because the two things fail in different ways and at different rates. A scoring engine or a SwiftUI view changes rarely and genuinely benefits from review. A question with a transposed answer key is a data defect, and data defects are common, mundane, and time-sensitive. Tying the fix for a one-character typo to a multi-day release cycle felt like the wrong trade, and once I saw it that way the alternative came into focus.

A manifest the app polls first

Content is indexed by a single manifest.json the app fetches before anything else. It carries a schemaVersion, a generatedAt timestamp, a contentBaseUrl, and a dictionary of languages. Each language entry holds an optional blueprint version and an optional audio block. The blueprint version is the interesting part: a version string and a checksum.

{
  "schemaVersion": 2,
  "contentBaseUrl": "https://.../api/content/v2",
  "languages": {
    "es": {
      "blueprint": { "version": "1.0.1", "checksum": "sha256:..." },
      "audio": { "version": "1.0.0", "files": [ { "name": "...", "checksum": "sha256:..." } ] }
    }
  }
}

One detail I am quietly proud of: the apps key off the checksum, not the version. The version is a human-readable hint pulled from the blueprint's own metadata. The checksum is the source of truth for "did this actually change," which sidesteps a whole class of bugs where content gets edited but a version number is forgotten. Bytes do not lie.

The checksum is the whole trick

The checksum is sha256: followed by the hex digest of the raw blueprint file bytes. On device, computing and comparing it is small and synchronous:

nonisolated static func sha256(of data: Data) -> String {
    let hash = SHA256.hash(data: data)
    return "sha256:" + hash.compactMap { String(format: "%02x", $0) }.joined()
}

func needsBlueprintUpdate(for languageId: String,
                          serverVersion: ContentVersion) -> Bool {
    guard let local = stored[languageId]?.blueprintVersion else { return true }
    return local.checksum != serverVersion.checksum
}

That tiny comparison is the difference between "sync everything on every launch" and "sync the one language I corrected this morning." When I re-publish a fix for a single language, exactly one checksum changes, so exactly one blueprint downloads. The other forty-odd languages stay untouched on disk. The sync service walks every language in the manifest, asks the local store whether each checksum still matches, and collects only the mismatches into a small list of work to do.

The same SHA256 doubles as integrity verification. After a blueprint downloads, the client recomputes the digest and compares it to the manifest value before writing anything to disk. A truncated or corrupted response throws a checksumMismatch and is discarded rather than silently saved. The hash earns its keep twice: once to decide whether to download, and once to confirm what arrived is what was promised.

Text first, audio on demand

The sync service is an actor, which keeps all of its mutable bookkeeping safe under Swift concurrency without a single lock in sight. It runs a deliberately prioritized pass: manifest first, then blueprints, then UI localizations, and audio is handled separately. Blueprints are what a test actually needs to run, so they download proactively for every language. Audio is large and most users only ever open one or two languages, so I do not pull it up front at all.

On launch, the app fetches the manifest and downloads only changed blueprints and localizations.
Audio for a language is fetched lazily, the first time a user selects that language to test in.
Audio is checked file by file: each clip has its own checksum, so re-publishing one corrected recording re-downloads one file, not the whole set (around eight clips per language).

Two cooldowns guard all of this: a short one on manifest checks (about 30 minutes) so foregrounding repeatedly does not hammer the server, and a much longer minimum interval on full syncs (about a week). A force flag bypasses both for first launch and manual refresh. The manifest fetch itself uses .reloadIgnoringLocalCacheData so a freshly published fix is never hidden behind a stale HTTP cache, a subtle gotcha the first time a corrected blueprint stubbornly refused to appear in testing.

Offline is not optional

A content server keyed by checksums is wonderful, right up until you remember a placement test needs to start immediately, on a plane, in a basement, with no connection at all. So every blueprint also ships bundled inside the app as an offline fallback. The server is an enhancement, never a dependency.

The loader encodes that priority directly. It reads the downloaded copy first, because that is the most recently corrected version, and falls back to the bundled blueprint that shipped with the binary if no download exists. There is even a small bit of defense-in-depth: if a downloaded file somehow fails to decode, the loader logs it and falls through to the bundled copy rather than crashing a user mid-test.

static func loadBlueprintWithFallback(for languageId: String)
    throws -> (blueprint: TestBlueprint, source: BlueprintSource) {
    if let data = ContentPaths.downloadedBlueprintData(for: languageId) {
        do { return (try decodeBlueprint(from: data), .downloaded) }
        catch { /* log, fall through to bundled */ }
    }
    return (try loadBundledBlueprint(for: languageId), .bundled)
}

The bundled copy is what makes "works fully offline" real. A brand-new install with no network still has a complete, scoreable test for every supported language on first launch (each blueprint is roughly 20KB and holds about 40 items). The content pipeline then quietly catches that install up to the latest corrections the next time it has a connection.

What publishing a fix actually looks like

The other half of the story lives in the web repo, which serves the content API. A build script reads the authored question files, copies each into blueprints/{iso}.json, runs shasum -a 256 over the bytes, and regenerates the manifest's blueprint entries. So correcting a wrong answer is: edit the question source, run the build, deploy. The manifest's checksum for that one language flips, and every device picks up the fix on its next poll.

That pipeline has its own sharp edge worth naming honestly: the blueprint regeneration step is destructive, it clears and rebuilds the blueprints directory, and it deliberately leaves audio entries untouched. It also depends on the authored content being the true source. Fix something directly in the served copy instead of the source and the next build silently reverts it. Knowing exactly which step owns which file is the price of decoupling them.

The product is a test. The test is data. Treating that data as something I can ship on its own schedule, verified by a checksum rather than fused to the app binary, is what keeps the whole thing maintainable across dozens of languages.

What the split actually helped with

The payoff showed up in time and in posture. The common class of content fixes, a bad distractor, a mislabeled answer, a typo in a stem, went from a days-long release to a minutes-long push. But the part that surprised me was emotional. When fixing a wrong answer means submitting a build and waiting for review, you brace a little before you look at your own data. When fixing it means editing a file and deploying, you can afford to be honest about content quality, because honesty stopped being expensive. That changed how willing I was to go looking for mistakes at all.

Takeaways

Separate code and content into distinct ship channels so urgent data fixes are not held hostage by App Store review.
Key sync on a content checksum, not a version string, so "did this change" is decided by bytes and cannot be defeated by a forgotten version bump.
Reuse the same SHA256 for integrity: verify what arrived matches the manifest before writing it to disk.
Prioritize the sync (blueprints eagerly, audio lazily) and add cooldowns so a server-driven model stays cheap and polite.
Always ship a bundled offline fallback with a downloaded-first, bundled-second loader, so the first launch with no network still has a complete, answerable test.
Decoupling has a cost: be explicit about which step owns which file, or the source and the served copy will quietly drift.

Fixing question data without an App Store release.