Skip to main content
Leaving The Matrix
nova-dev 8 min read

The brain swap: pluggable AI providers in one chat loop

How Nova flips between Claude, Gemini, Ollama, OpenRouter, and an unofficial Claude Max proxy mid-conversation — and the five gotchas that made the Max provider painful.

#nova#providers#anthropic

If a personal AI tool depends on one model lab's API, you're going to have a bad week eventually. Rate limits, regional outages, pricing changes, model deprecations — they're not hypothetical, they're scheduled. So I built Nova to swap brains at runtime from the start.

The header has a provider dropdown and a model dropdown. Pick anthropic + claude-sonnet-4-6, send a message — Sonnet answers. Open the dropdown again, pick google + gemini-2.5-pro, send the next one — Gemini answers, in the same conversation. The chat doesn't reset. Tool calls keep working. Voice keeps working. The only thing that changes is the personality on the other end.

The provider contract

Every provider is a function that returns whatever the Vercel AI SDK's streamText wants for its model argument. That's it. The agent loop calls one function, gets back a model, hands it to streamText with the conversation and tool list. The provider's only job is to know how to construct the SDK's adapter for its respective lab.

// packages/core/src/agent/providers/index.ts
export type ProviderFactory = (config: ProviderConfig) => LanguageModel;

const PROVIDERS = {
  anthropic: anthropicProvider,
  google: googleProvider,
  ollama: ollamaProvider,
  openrouter: openrouterProvider,
  max: maxProvider,
  mock: mockProvider,
} satisfies Record<ProviderId, ProviderFactory>;

Anthropic, Google, OpenRouter, and Ollama are essentially one-liners — the SDK has first-class adapters for each. mock exists so I can develop without spending tokens. The interesting one is max.

The Anthropic Max provider

I pay for Anthropic's Max plan because I use Claude all day. Anthropic does not officially expose Max-tier inference through the API — the plan is for the chat product. There's an unofficial open-source proxy called anthropic-max-router that authenticates as the Claude desktop app and re-exposes the Max relationship as a local Anthropic-compatible endpoint on http://localhost:3000. If I'm going to use it personally, I want it bound into Nova so I never think about it.

The plan was simple: write a provider that points the Anthropic SDK at localhost:3000 instead of the real API. The SDK is already provider-agnostic about base URLs.

Then I hit five problems in a row.

Gotcha 1: spawn EINVAL on Windows .cmd shims

I had Electron auto-spawn the router via spawn('npx', ['--yes', 'anthropic-max-router']). On Windows, npx is a .cmd shim, and Node's child_process can't spawn a .cmd directly without a shell. The fix is the most googled boilerplate in Electron-on-Windows: shell: true. Easy to fix, painful to discover.

Gotcha 2: the Anthropic SDK posts to ${baseURL}/messages

I set baseURL: 'http://localhost:3000' and watched every request 404 with no logs. The router's endpoint is /v1/messages; the Anthropic SDK appends /messages verbatim, expecting the baseURL to already include /v1. So baseURL: 'http://localhost:3000/v1'. The SDK is not wrong; the docs are just quiet about it.

Gotcha 3: the router writes .oauth-tokens.json to CWD

First time I auto-spawned the router, it picked up no auth and crashed on every request. The router stores OAuth tokens in ./.oauth-tokens.json, relative to the working directory it was launched from. When Electron spawns it, the cwd is wherever Electron is — varies by environment. Pinned the spawn cwd to app.getPath('userData') (which is %APPDATA%\@nova\desktop on Windows), and the auth file finally lands somewhere stable. The router's auth CLI lives at dist/cli.js in the package and isn't exposed as a bin, so the one-time login still has to be a manual node <path> from the userData dir. Once. Forever.

Gotcha 4: the AI SDK sends temperature: 0 by default

This one cost me an afternoon. Every Max-routed request returned a 400. The error: "temperature is not supported on this model." The Vercel AI SDK v4 defaults to temperature: 0 if you don't pass one. Anthropic's reasoning models, when accessed through the Claude.ai subscription path, refuse any temperature parameter at all — not just non-zero ones. The router faithfully forwards the field, the upstream rejects it, the SDK never knew. Solved by wrapping the provider's fetch to strip temperature, top_p, and top_k from outbound bodies before they hit the wire:

// packages/core/src/agent/providers/max.ts
const stripParams = async (input, init) => {
  if (init?.body) {
    const body = JSON.parse(init.body as string);
    delete body.temperature;
    delete body.top_p;
    delete body.top_k;
    init.body = JSON.stringify(body);
  }
  return fetch(input, init);
};

Gotcha 5: a transient "operation aborted" after a 200

Once, after a successful response, I got This operation was aborted on the next request. Then everything worked again. I haven't reproduced it, I haven't root-caused it, and it's been on the open-issues list for a few weeks. Logging it here so I remember to circle back.

The supervisor

The router can crash. The router can be killed by something else holding port 3000. The router can fail health checks if I just woke my laptop. So Nova doesn't just spawn it once — there's a supervisor in the main process that:

  • Health-checks http://localhost:3000/health on a one-second cadence.
  • Restarts the child with backoff if it dies.
  • Pipes stdout and stderr into Nova's dev.log as [max-router] … lines.
  • Broadcasts state changes to the renderer via three IPC channels: nova:max-router:status, :restart, and :state-changed.
  • Drives a banner in the chat header that only appears when the active provider is max and the router isn't running. Click it and it triggers a restart.

It works. It also gave me a Windows-process-tree war story that took out a whole afternoon — when I kill cmd.exe, the grandchild node.exe running the router survives, holds port 3000, and the supervisor's next spawn loops forever on EADDRINUSE. That's a future post.

What it bought me

Two months in, the brain-swap dropdown has paid for itself many times over:

  • When Anthropic had a regional incident, I flipped to Gemini and kept going.
  • When I want to pressure-test a feature on a cheap model, I flip to Ollama or OpenRouter and burn no tokens.
  • When I want my Max plan's quota to do the work for me, I flip to max and the proxy handles it.
  • When I'm developing the UI and don't want to think about responses, I flip to mock.

One chat loop, six brains. The provider abstraction was probably the single best architectural call I made on this project. It's also the one I almost skipped because "I'll just hardcode Anthropic for now."

Next up: voice. The presence orb, the Whisper sidecar, and why Cartesia's streaming TTS is the latency choice that makes the whole thing feel real.

Want this in real time?

Discussion happens in the Discord.

Join the Discord