Tiles: study, recall, feedback, and a scheduler that picks the board

Most of my Japanese study cluster is plain HTML and JavaScript served from a static directory. Japanese Tiles is the exception. It is the newest of the bunch and the one I reached for a fuller toolchain on: React 19, TypeScript, Vite, Tailwind 4, and Zustand for state. It earned that toolchain because it tries to do something the simpler tools do not. It shapes a study session into three deliberate phases, and it lets a spaced-repetition scheduler run quietly underneath, deciding not just which words you see but how many land on the board at once.

A round in three phases

A flashcard asks one question at a time and grades you on the spot. That works, but it collapses studying and testing into a single moment, so you rarely get to look at a fresh set of words before you are asked to produce them. Tiles separates those moments on purpose. The phase is just a small union type in the store, 'idle' | 'study' | 'recall' | 'feedback', and the whole UI keys off it.

Study. You see the words and their meanings together. No pressure, no scoring, just exposure.
Recall. The words become tiles. The slots show the readings and English meanings in shuffled order, and you match each Japanese tile to the slot it belongs in.
Feedback. The board shows what you placed against what was correct, so the round closes with a clear picture of where you stood.

The matching step turned out to be the heart of it. The interaction is tap-to-select, then tap-to-place: you tap a Japanese tile in the bank, then tap the slot you think it belongs to. I expected to need a drag-and-drop library and was pleased not to. Tap-to-place is two pieces of store state, a selectedTileId and a placements map keyed by slot, and it behaves identically on a phone and a trackpad with no pointer-event wrangling. Tapping a filled slot returns its tile to the bank, and tapping a slot while a tile is selected swaps cleanly. The win that mattered was conceptual, not visual: because you are reasoning about a small set all at once, the phase surfaces the words you confused with each other, not just the ones you forgot outright.

Scoring stays boring on purpose

It would be tempting to score recall with partial credit or fuzzy matching. I decided not to. Every word's correct home is its own slot, so grading is a flat comparison of each slot's placed tile against its target. A slot is right or wrong, and the round's result is the tally. The whole function fits in a few lines:

export function scoreRound(
  placements: Record<string, string>,   // slotWordId -> placedWordId
  correctMapping: Record<string, string> // slotWordId -> correctWordId
): RoundResult[] {
  return Object.keys(correctMapping).map((slotId) => ({
    wordId: slotId,
    correct: placements[slotId] === correctMapping[slotId],
  }))
}

The simplicity is deliberate, not a shortcut. The interesting logic lives in the scheduler that decides which words show up. Keeping grading boring means the per-word signal feeding that scheduler stays unambiguous: a clean placement is a pass, anything else is a miss, with no half-states to reconcile later. Each result becomes an SM-2 quality of 5 for correct or 1 for wrong, and that single number drives everything downstream.

An SM-2 variant underneath

Underneath the board is a spaced-repetition scheduler, a variant of the classic SM-2 algorithm. Each word carries a small UserWordState: a stability (the current interval in days), a difficulty (the SM-2 easiness factor, starting at 2.5), a dueAt timestamp, and running counts of successes and failures. The whole map persists to localStorage through Zustand's persist middleware, so your history survives a reload with no backend involved.

On review, the easiness factor moves by the standard SM-2 response curve and is then floored. The floor turned out to matter: without it, a word you keep missing drives its factor toward zero and its intervals collapse into noise. Clamping at 1.3 means even your hardest words eventually earn breathing room once you start getting them right.

// SM-2 variant: quality 5 = correct, quality 1 = incorrect
let ef = state.difficulty
ef = ef + (0.1 - (5 - quality) * (0.08 + (5 - quality) * 0.02))
ef = Math.max(1.3, ef)               // easiness factor never falls below 1.3

let interval: number
const reps = state.successes
if (quality >= 3) {
  if (reps === 0) interval = 1         // first correct review: 1 day
  else if (reps === 1) interval = 6    // second: 6 days
  else interval = Math.round(state.stability * ef)
} else {
  interval = 1                         // a miss resets the interval to 1 day
}

So a word's first two correct reviews land at one day and then six days, fixed waypoints that get every word past the initial cramming stage. Only after that does the interval scale by stability * ef, fanning out faster for easy words and staying tight for stubborn ones. A miss does not merely shrink the interval; it snaps it back to a single day, so a forgotten word is back in front of you tomorrow.

Three buckets feed each round

The starting set is 250 JLPT N5 words, the vocabulary a beginner meets first. From that pool the selector derives a few states and fills the board from them in priority order. A word reads as new if it has no saved state at all (dueAt === 0), due if its dueAt has passed, and weak if its difficulty has dropped below 2.0 while it is not yet due. Those three buckets are filled with rough proportions:

Up to about 60% of the board from due words, sorted most-overdue first so the schedule actually drives the session.
Up to about 20% from weak words, shuffled, so struggling items keep resurfacing even when they are not technically due.
The remainder from new words, introducing fresh vocabulary at a measured pace.

There is a deliberate fallback at the end: if those three buckets cannot fill the board (early on, or once you have nearly cleared a level), it tops up from any remaining filtered words so a round is never short. The final list is shuffled before it reaches the screen, so position never leaks a hint about which words are review and which are new.

The fun part is matching tiles. The useful part is that the tiles on the board, and how many there are, were chosen for you.

The board grows and shrinks with you

The detail I am most fond of is how many tiles appear. In adaptive mode the board size is not fixed; it is derived from your recent accuracy. The store keeps the last ten round scores in recentAccuracies, and the size is a running walk over that history.

export function getAdaptiveBoardSize(recentAccuracies: number[]): number {
  if (recentAccuracies.length === 0) return 1
  let size = 1
  for (const acc of recentAccuracies) {
    if (acc >= 70) size++   // a good round widens the board
    else size--             // a rough round narrows it
  }
  return Math.min(10, Math.max(1, size))   // clamp to 1..10
}

You start with a single pair, which sounds trivial until you watch a true beginner. Clearing one round at 70% or better widens the next board by one; falling short narrows it. The size is clamped between 1 and 10, so a bad streak can never bury you under an impossible board, and a hot streak tops out at a board that is still glanceable. It is a tiny piece of code, but it does the work a difficulty slider would otherwise push onto the learner, who is the last person well placed to judge their own level mid-session.

What I took away

Splitting a round into study, recall, and feedback gives exposure and testing their own moments, and matching a whole small set at once surfaces confusions a single-card quiz hides.
Tap-to-select then tap-to-place beat drag-and-drop: two pieces of state, identical on touch and trackpad, no pointer-event library.
Keep scoring boring. A flat right-or-wrong comparison gives the scheduler an unambiguous quality of 5 or 1 and keeps the complexity where it belongs.
Floor the easiness factor at 1.3 and reset a missed word to a one-day interval, while fixed early waypoints at one and six days carry every word past cramming.
Three derived states, new, due, and weak, filled at roughly 60/20/rest with a top-up fallback, were enough to build rounds that target what you actually struggle with.
Let the board size itself. Deriving it from recent accuracy means the tool adapts to the learner instead of asking the learner to configure the tool.

Tiles started as the flashiest tool in the cluster and turned into the one that taught me the most about scheduling. What I came to appreciate was not the animation but the quiet bookkeeping: the easiness factors, the due timestamps, and the little running tally of recent accuracy that together decide what lands on the board and how much of it.

Tiles: study, recall, feedback, and a scheduler that picks the board.

A round in three phases

Scoring stays boring on purpose

An SM-2 variant underneath

Three buckets feed each round

The board grows and shrinks with you

What I took away

Workstation4

Grading typed answers without being a pedant.

A smooth JQuake client, built as a small Tauri proof of concept.