Generating routines with an on-device model

Sequence lets you describe a routine in plain language, something like "HIIT: 30s work, 10s rest, 8 rounds", and get a drafted set of steps back. The part I found most interesting was where that drafting happens: entirely on the device, using Apple's FoundationModels framework on iOS 26, so nothing about your routine ever leaves the phone.

Why on device at all

Sequence is a local, private app. No accounts, no subscriptions, no server. A feature that phoned out to a cloud model would have quietly worked against that: the whole idea of "describe a routine and it gets drafted" is that the description is yours, and it felt right to keep it that way. The on-device language model on iOS 26 meant I could offer generation without sending anyone's words off to a backend I would then have to operate, secure, and pay for.

It also fit the architecture. Routines and history already live only on the device in SwiftData, and the model produces the same domain types the rest of the app uses, so generation slots in rather than carving out an exception. The whole thing sits behind a #if canImport(FoundationModels) guard so the project still builds and runs on older SDKs, with the AI surface simply absent.

Availability is not a yes or no

The first thing the generation service does is ask whether the model is actually usable, because on-device intelligence is not guaranteed to be present. The hardware may not support it, the model may still be downloading, storage may be too low, or the person may simply not have Apple Intelligence turned on. Lumping all of that into one "AI is unavailable" error would leave people stuck with no idea what to do next.

So I read SystemLanguageModel.default.availability up front. It is an enum: .available, or .unavailable(reason). I switch on it and map each unavailable reason to its own localized message. One detail worth flagging: I match on the reason's description rather than its concrete cases, because the exact case names on a brand-new framework are a moving target, and I would rather degrade gracefully than fail to compile against a future point release.

Device not supported: a clear, final message, and the generate button never appears.
Model not ready: ask the person to try again once the download finishes.
Low storage: explain that space is needed for the model.
Apple Intelligence not enabled: point at Settings > Apple Intelligence & Siri.

The button itself is gated on a simple isAvailable computed property, so on a device that cannot run the model the feature is invisible rather than broken. Everything else in the app keeps working regardless.

Shaping the prompt to return data, not prose

I did not want a paragraph describing a workout. I wanted structured data I could turn into real, playable steps. Sequence is built around a CardKind enum with associated config per case: countdown, countup, info, repetition, rest, interval, checklist, and randomDuration. So the prompt asks the model to return only a JSON object with a title and an actions array, where each action has a type drawn from exactly that list plus the fields that type needs.

To anchor the format I include one fully formed example in the prompt. Showing the model a complete HIIT routine did far more to lock in the structure than any amount of describing the structure in words. The example becomes the contract:

Return ONLY a JSON object (no markdown, no explanation):
{"title":"sequence name","actions":[...]}

Each action has "title" and "type".
For countdown/rest, also "duration" (seconds).
For repetition, also "sets" and "reps".

Example for HIIT:
{"title":"HIIT Workout","actions":[
  {"title":"Warm Up","type":"countdown","duration":60},
  {"title":"Work","type":"countdown","duration":30},
  {"title":"Rest","type":"countdown","duration":10},
  {"title":"Cool Down","type":"countdown","duration":60}
]}

Calling the model is then almost anticlimactic: spin up a LanguageModelSession, await session.respond(to: prompt), and read response.content. The interesting work is on either side of that line.

Parsing defensively, because models improvise

Even when you say "return only JSON," a model will sometimes wrap it in a markdown code fence, prepend a sentence, or rename a field. A strict Codable decode against that fails for reasons that have nothing to do with the routine itself. So the parser treats the output as untrusted text and works in layers.

First it normalizes: trim whitespace, strip a leading ```json or ``` fence and any trailing one, then slice from the first { to the last } so a chatty preamble cannot break the decode. Then it tries the strict path, decoding into a small private Decodable struct.

The field names on that struct are deliberately forgiving. A countdown's duration might come back as duration, durationSeconds, or seconds; an info step's text might be body, message, or text. The struct declares all of them as optionals and a toCardKind() method picks the first that is present. If the strict decode still fails, it falls back to a hand-rolled JSONSerialization pass that walks the dictionary with the same aliases and the same clamping. Two parsers, one set of rules.

case "countdown", "timer":
    let dur = duration ?? durationSeconds ?? seconds ?? 30
    guard dur > 0 else { return nil }
    return .countdown(CountdownConfig(
        durationSeconds: dur,
        autoComplete: autoComplete ?? true
    ))

Two small defenses earned their keep. Every numeric value is clamped with max(1, ...) so a model that emits a zero or negative duration cannot produce a card that finishes instantly or, worse, never. And the whole thing uses compactMap: any action that cannot be turned into a valid CardKind is dropped rather than aborting the routine, and I only return a result if at least one card survived. A draft with five good steps and one quietly skipped beats no draft at all.

The model is a drafting tool, not an oracle. I give it a tight format, expect it to occasionally color outside the lines, and clean up after it before anything reaches the data model.

A draft, never an autosave

One product decision shaped the rest: generation produces a GeneratedSequence the person reviews and edits before anything is saved. It is never written straight to SwiftData. That framing took the pressure off perfect parsing. The model only has to get close; the editor is where it becomes right. It also means a slightly odd result is an easy tweak instead of a wrong routine silently living in your library.

What I took away

On-device generation let me add a real "describe it and draft it" feature while keeping the app fully local and private, with no backend to run.
Model availability has several distinct failure reasons; reading SystemLanguageModel.availability and mapping each to its own message turns a dead end into a next step.
Anchoring the prompt with one complete example did more to enforce the JSON shape than any prose description of the schema.
Generated text is untrusted input. Normalize it, accept field aliases, clamp the numbers, drop the bad rows with compactMap, and keep a fallback parser. The feature stops feeling flaky.
Treat the output as a draft to be reviewed, not a value to be committed. That single decision absorbs most of the model's imperfection.

Generating routines with an on-device model.

Why on device at all

Availability is not a yes or no

Shaping the prompt to return data, not prose

Parsing defensively, because models improvise

A draft, never an autosave

What I took away

Workstation4

Localizing the whole app into 38 languages.

Going from four card types to eight.