Workstation4 / Blog / scoring
scoring twitch sqlite game-design

Rewarding speed: how Quizo scores chat answers by who got there first.

Live trivia in a stream chat feels more like a race than a survey. In Quizo, getting the answer right is the price of entry; getting it right first is what earns the points. That sounds like a one-line rule, but once timing decides the winners, the consequences fan out into the chat parser, the database schema, and the test suite. Here is the real machinery, and the small decisions that made it work.

A race, not a survey

The easy version of quiz scoring gives a point to everyone who answers correctly. It is fair, and it is also flat, and in a live round it took all the energy out of the moment. I wanted the scramble where chat races to type a single letter before anyone else, so Quizo scores on speed. When a round resolves, the engine pulls the correct answers ordered by time and awards points down a fixed ladder:

// Points awarded by rank (1st=3, 2nd=2, 3rd=1, then 0)
const POINTS = [3, 2, 1, 0, 0] as const;

// Correct answers, fastest first, capped at the top 5
const correctAnswers = db.prepare(`
  SELECT id, username, username_normalized, timestamp
  FROM answers
  WHERE session_id = ? AND question_index = ? AND is_correct = 1
  ORDER BY timestamp ASC
  LIMIT 5
`).all(session.id, this.currentQuestionIndex);

Notice the 4th and 5th place still earn zero. I kept them in the list on purpose: the results overlay shows the top five names even when only the first three score, so being fast-but-fourth still gets you on screen. The ladder is short by design too. If everyone who is correct gets a point, speed stops mattering; a steep 3/2/1 cliff keeps the front of the pack worth fighting for.

What the clock actually measures

The whole system rests on one number: the timestamp stored with each answer. It is a server-side Date.now() captured at the moment the engine accepts the answer, not a client clock and not the chat message's own time. That matters. Viewers are spread across different latencies and Twitch's IRC delivery is not perfectly ordered, so the only timestamp I can reason about consistently is when the message reached my engine. It is not a stopwatch on the human; it is the arrival order at the server, and I am honest with myself that those are different things.

One subtle rule shapes the race more than anything: first answer wins, and a user only gets one. Before inserting, the engine checks whether this person has already answered the current question, and if so it rejects the new message outright. Their original letter is preserved even if they panic and type a correction:

  • You cannot fish for the right answer by spamming A, B, C, D and hoping one lands. Your first letter is locked in.
  • The duplicate check is case-insensitive. I store a username_normalized (lowercased) alongside the display name so that Alice, alice, and ALICE are treated as the same person, while the leaderboard still shows their chosen casing.

That normalized column also does the score aggregation. Points roll up with an upsert keyed on (session_id, username_normalized), so a viewer who changes their display capitalization mid-game does not accidentally split into two leaderboard entries.

Parsing a noisy chat

A live Twitch chat is not a clean input stream. People type reactions, emotes, second guesses, and jokes at the same time the round is open. If the parser is generous, the answer counts fill with noise and the speed ranking stops meaning anything. So the parser is deliberately strict: it accepts exactly one letter, A through D, case-insensitive, trimmed, and rejects everything else.

export function parseAnswer(message: string): ValidAnswer | null {
  const trimmed = message.trim().toUpperCase();
  if (trimmed.length === 1 && ['A', 'B', 'C', 'D'].includes(trimmed)) {
    return trimmed as ValidAnswer;
  }
  return null;
}

So b and B are answers; B!!!, I think it is B, and E are not. The narrowness is the helpful part. The parser test enumerates these cases explicitly, including rejecting "Answer: A" and "AB", because the line between "a real answer" and "a comment that happens to contain a letter" is exactly where the timestamp gets stamped, and that decision picks the winner.

Decide what counts before you decide who wins. A loose parser does not just add noise; it hands points to whoever's reaction emote parsed by accident.

The same engine, simulator or Twitch

There is an awkward problem with building a chat game: you cannot summon a live audience every time you want to test a change. So the engine never talks to Twitch directly. It exposes a single method, submitAnswer(username, answer), and everything funnels through it.

  • The Twitch adapter listens on tmi.js message events, runs parseAnswer, and on a valid letter calls submitAnswer on the target session. Anything that does not parse is simply counted and dropped.
  • In development I drive the exact same method from a small "Simulated Chat" panel in the host UI. It POSTs to /api/sessions/:uuid/answer, where I can type as a dozen fake viewers and watch scoring resolve with no stream running.
  • There is even a bulk endpoint that fires many random answers at once, which is how I load-test the duplicate check and the counts under a flood.

Because all three paths land on the same submitAnswer, the scoring logic has exactly one code path to be correct. What I exercise on my laptop is what runs live. A new chat source would be a new adapter, not a rewrite of the engine.

Where the data lives

Every answer is a row in a SQLite answers table: session, question index, username, normalized username, the letter, the timestamp, whether it was correct, and the points it eventually earned. I chose to persist answers rather than keep them only in memory for two reasons. First, scoring becomes a query, not a hand-rolled sort, and SQLite's ORDER BY timestamp ASC LIMIT 5 with an index on (session_id, question_index, is_correct, timestamp) stays fast even when a popular round draws hundreds of answers. Second, the durable state means a session can survive a server restart.

That recovery turned out to be one of the more interesting edge cases. Each phase writes its deadline as a phase_ends_at timestamp. If the server restarts mid-round, the engine reloads the session and compares that deadline to Date.now(): if time is left it resumes the timer with the remainder, and if the deadline already passed while the process was down, it transitions straight to the next phase. The race does not silently freeze because a deploy happened to land in the middle of it.

A race you can reproduce

A scoring system that turns on timing is exactly the kind of thing that is miserable to test if you are not careful. The trick is that I do not have to fake the clock at all. Because order is decided by arrival, a test simply submits answers in the order it wants and the timestamps come out sequential and monotonic. The canonical scoring test reads almost like a script of a real round:

session.submitAnswer('user1', 'B'); // 1st correct -> 3 pts
session.submitAnswer('user2', 'A'); // wrong
session.submitAnswer('user3', 'B'); // 2nd correct -> 2 pts
session.submitAnswer('user4', 'B'); // 3rd correct -> 1 pt
session.submitAnswer('user5', 'B'); // 4th correct -> 0 pts

session.nextQuestion();              // resolve the round
const { topAnswerers } = session.getOverlayState();
expect(topAnswerers[0]).toEqual({ username: 'user1', points: 3, rank: 1 });
expect(topAnswerers[1]).toEqual({ username: 'user3', points: 2, rank: 2 });

The other half of reproducibility is the database. Each test spins up a fresh in-memory SQLite instance and tears it down afterward, so there is no shared state leaking between cases and no real data directory to clean. The same engine code runs against an on-disk database in production and an ephemeral one in the suite. The tests then cover the things I most worried about: ties and order, no-correct-answer rounds (top answerers comes back empty), case-folded duplicate users, score accumulation across questions, and the rule that the first answer is the one preserved.

One honest note on randomness: question order is shuffled with a Fisher-Yates pass so no two rounds feel identical, and in "unlimited" mode it reshuffles and loops forever. That shuffle uses plain Math.random(), which means it is genuinely unpredictable for viewers. The tests stay deterministic not by seeding the shuffle but by knowing each pack's fixed correct answers and asserting on the scoring logic, which is the part where bugs actually hide.

Takeaways

Scoring chat answers by speed reads like a single rule, but the interesting work lived all around it.

  • Decide what counts before who wins. A strict single-letter parser keeps a noisy chat from polluting the race, because the parse boundary is where the deciding timestamp gets stamped.
  • Record enough to rank, not just to tally. A username, a normalized username, and a server-side timestamp on every answer turn a correct-or-not check into an orderable race.
  • Be precise about what your clock means. Quizo measures arrival order at the server, not human reaction time, and saying so out loud kept the design honest.
  • Funnel every input through one method. Simulated chat, a bulk spammer, and live Twitch all call submitAnswer, so there is a single place that has to be right.
  • You may not need to fake the clock. When order comes from arrival, tests get determinism for free just by submitting in sequence against a fresh in-memory database.
W4

Workstation4

A quiet workshop for cool, strange, useful iOS apps. Run by one developer who chases the weird problems for sport.

About the workshop →