Keeping the model key off the device behind a small Express proxy

Pinball Points reads scores from a photo by sending the image to a vision model and asking for structured JSON back. That call has to be authorized somewhere, with a real provider credential. The one place I really didn't want that credential to live was inside the app binary that ships to people's phones.

Why the key cannot ride along in the app

A shipped app is not a secret store. Anything compiled into the client can be recovered: pulled out of the IPA, lifted from memory, or read straight off the wire by anyone willing to run a proxy against their own device. If an OpenAI key shipped inside Pinball Points, I think it would only be a matter of time before someone extracted it and started running their own workloads on my bill. There is no rate limit, spend cap, or rotation schedule that makes a client-embedded provider key actually safe, because the attacker holds the key and can call the provider directly, bypassing all of it. The honest fix is to not put it there.

So score analysis never talks to OpenAI directly. The app posts its image to a small proxy I run at score.pinballpoints.com, and the proxy is the only thing that holds the OpenAI credential. The key lives in the server's environment and never leaves it.

The app still carries an app-level key, but it is a different kind of secret: a shared token that only authorizes calls to my proxy, not to OpenAI. If it ever leaked, the worst case is calls to an endpoint I fully control, fronted by auth and a rate limiter, and I can rotate it without touching anything at the provider. On top of that the client stores it XOR-masked rather than as a plain string, so it does not show up in a naive strings dump of the binary. That is obfuscation, not real security, and I treat it that way: the genuine protection is that this token cannot spend money at OpenAI at all.

A deliberately thin proxy

The proxy is a small Express 5 service whose whole job is to stand between the app and OpenAI's /v1/responses endpoint. I kept it thin on purpose: the less logic it carries, the less there is to break and the less there is to get wrong from a security standpoint. It accepts a validated request, attaches the credential, forwards the call upstream, and pipes the response back.

Piping the response back rather than reconstructing it turned out to matter. The proxy copies OpenAI's status code and content type, reads the body as an ArrayBuffer, and sends those exact bytes on to the client. That keeps the proxy honest: it is a conduit, not a second opinion, and it never gets a chance to reshape or reinterpret what the model returned. The app parses the same response it would have gotten talking to OpenAI directly, just with my key doing the authorizing instead of one it should never have seen.

app.post("/v1/analyze", auth, rateLimit, validate, async (req, res) => {
  const requestId = crypto.randomUUID().slice(0, 8);
  const start = Date.now();

  const openaiRes = await fetch(OPENAI_URL, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify(req.body),
  });

  res.status(openaiRes.status);
  const ct = openaiRes.headers.get("content-type");
  if (ct) res.set("Content-Type", ct);

  const body = await openaiRes.arrayBuffer();
  res.send(Buffer.from(body));
});

That is essentially the entire interesting part of the server. Everything else is the guard rails that decide whether a request is allowed to reach this code at all.

Guarding the door

Holding the key server-side solves credential exposure, but it opens a new question: now there is a public endpoint that spends money on every call, so who is allowed to use it? A short stack of middleware answers that before any request touches OpenAI. The route reads app.post("/v1/analyze", auth, rateLimit, validate, ...), and the order is the point: cheap rejections happen first.

Helmet sets sensible security headers on every response.
A single trusted hop via app.set("trust proxy", 1), so the proxy reads the real client address from exactly one front-end web server in front of it rather than trusting an arbitrary chain of forwarded headers. Trusting too many hops would let a caller spoof their IP and slip past the rate limiter.
App auth via an X-API-Key header checked against an environment value. No key is a 401, a wrong key is a 403, and neither one ever reaches the model.
Rate limiting capped at 30 requests per minute per client, so one device cannot turn the endpoint into a free, unbounded vision runner.
Request validation that rejects anything that would cost money without being a real score photo.

The validation step is the one I would call non-obvious. It does three checks, and each maps to a way the endpoint could otherwise be abused. First, the model must be on a small allowlist (the cheap vision tiers the app actually uses); a caller cannot ask my key to run an expensive model I never intended to pay for. Second, the input field must be a non-empty array. Third, and this is the useful one, at least one message must actually contain an input_image:

const hasImage = input.some((msg) => {
  const content = msg.content;
  if (!Array.isArray(content)) return false;
  return content.some((item) => item.type === "input_image");
});
if (!hasImage) {
  return res.status(400).json({ error: "Request must include an input_image" });
}

Without that last check the proxy would happily forward a text-only prompt, which means a leaked app token could quietly repurpose my OpenAI key as a general chat endpoint. Requiring an image keeps the endpoint shaped like the one thing it exists to do: look at a pinball display and read the scores.

Sizing the body to the real payload

The body parser is set to express.json({ limit: "12mb" }), and that number is not arbitrary. The client sends images as base64 inside the JSON body, and base64 inflates bytes by about a third. Before upload the app downscales each frame to a 1024px long edge and re-encodes it as JPEG at quality 0.7, which keeps a typical capture well under a megabyte on the wire. The 12mb ceiling leaves comfortable room for the multi-image path, where the app sends several photos of the same display at once so the model can cross-reference an ambiguous digit, while still rejecting anything absurdly large before it is buffered into memory. A body limit is a quiet but real denial-of-service guard: without one, a single oversized POST can pin your process.

Watching cost without watching content

An endpoint that spends money on every request needs to be observable, but observability and privacy pull in opposite directions. I did not want logs full of the photos people take in arcades and basements. So the proxy logs the shape of each request, not its contents.

Every request gets a short correlation ID from crypto.randomUUID().slice(0, 8), and the structured log lines record the model, the upstream status code, and the elapsed time. That is enough to answer the questions that actually come up: which model is being hit, how fast it responds, how often it fails, and roughly what it is costing. It is also enough to trace one odd request end to end by its ID. What never gets written is the base64 image itself.

[proxy] id=a3f9c1 model=gpt-5-nano ip=10.0.0.4 forwarding...
[proxy] id=a3f9c1 model=gpt-5-nano status=200 elapsed=812ms

A log pair like that tells me what I need about reliability and spend and nothing about what anyone photographed. The client IP is the most identifying thing in there, and it is the network address of one trusted hop, not the picture.

Binding to localhost, not the world

One last detail keeps the proxy from being more exposed than it needs to be. The listener binds to 127.0.0.1 rather than 0.0.0.0, so the process is not reachable on any public interface at all. It sits behind the front-end web server, which terminates TLS, owns the public address, and forwards requests inward over loopback. Binding to localhost means the only way to reach the proxy is through that front door, which is exactly where the trusted-hop and auth checks live. Get the bind address wrong and every other guard becomes optional, because someone can hit the port directly.

The OpenAI key never ships, the endpoint never faces the open internet on its own, and the logs never hold a single image.

What I took away

Do not embed a provider credential in a client; route through a server that holds it, and give the client only a token that can spend nothing on its own.
Keep the proxy thin and pipe the upstream status, content type, and body through unchanged, so it stays a conduit instead of a second opinion.
Order your middleware cheapest-rejection-first, and put auth, rate limiting, and validation in front of anything that costs money per call.
Validate the request's shape, not just its presence: an allowlisted model plus a required input_image kept my key from being repurposed as a general chat endpoint.
Size the body limit to your real payload; it doubles as a quiet denial-of-service guard.
Log correlation ID, model, latency, and status, never the payload, and bind internal services to localhost so the front-end web server is the only public face.

Keeping the model key off the device behind a small Express proxy.