Skip to main content
Leaving The Matrix
nova-dev 6 min read

Picking the stack: Electron, Next.js, and the Vercel AI SDK

Why a Jarvis clone in 2026 ends up as a pnpm monorepo with an Electron main process, a Next.js renderer, and a swappable provider layer.

#nova#architecture

Day-zero choices on a project like this matter more than they should. Get them wrong and every feature after costs 2x. Get them right and the next six months feel like the codebase is helping you. Here's how I picked Nova's stack and what each choice was actually buying me.

Desktop runtime: Electron

I went back and forth on Tauri. Tauri ships a smaller binary, uses the system webview, and the Rust core is genuinely a pleasure to work in. But I want to spawn Python sidecars, talk to native Windows APIs, do tray icons, hook into global shortcuts, and pipe child-process stdout into the renderer DevTools console. Every one of those is a paved road in Electron and a side quest in Tauri. Electron also lets me ship the same Node ecosystem on the main process that I'm already using everywhere else.

The cost is real. A 200MB installer, a per-window Chromium tax, and a child-process model that has cost me actual sleep (more on that in a later post). I'm paying it knowingly.

UI framework: Next.js 15 in static-export mode

The renderer is just a Next.js app with output: 'export'. Electron loads the static build off disk in production and points at the dev server during development. I get React 19, Tailwind 4, the app router, server components I don't actually use yet, and a build pipeline that's already battle-tested.

The win that mattered most: I can develop the UI in a normal browser tab when I just want to iterate on a component. pnpm dev:ui serves the renderer on :5173. pnpm dev spins up Electron and points at the same server. Same code, two harnesses. Hot reload works in both.

Agent layer: Vercel AI SDK

The Vercel AI SDK (specifically streamText) is what makes the brain swap actually painless. I have a single function signature on the agent loop, and the provider it dispatches to is selected at runtime from a registry:

// packages/core/src/agent/providers/index.ts
const PROVIDERS: Record<ProviderId, ProviderFactory> = {
  anthropic: anthropicProvider,
  google: googleProvider,
  ollama: ollamaProvider,
  openrouter: openrouterProvider,
  max: maxProvider,
  mock: mockProvider,
};

The dropdown in Nova's header writes a string to localStorage; the next message uses whichever provider corresponds. Anthropic, Gemini, an Ollama model running locally, OpenRouter for free tiers — same chat, same tools, different brain. I'll write a whole post on the Anthropic Max router, which was the most painful one to wire up; for now, just know that the abstraction held.

Monorepo: pnpm workspaces

The repo is a pnpm monorepo from day one:

apps/
  desktop/   Electron main process — IPC, windows, voice services
  ui/        Next.js renderer (chat dashboard + HUD)
packages/
  core/      Agent loop, providers, voice, vault I/O, SkillRegistry
  shared/    IPC channel types, DTOs
  skills/    One package per plugin skill

Was this overkill on day one? Absolutely. Did I regret it? Not once. The skills system needs a clean module boundary, and so does the IPC contract between main and renderer. Having those as their own packages forces me to keep DTOs explicit and not let the Electron main process leak into business logic. When I added a new skill last week, I ran pnpm nova:skill:new lookup and got a working scaffold in under a minute.

Voice: local STT, streaming TTS

I want to talk to Nova when I'm not in front of a microphone array. That means decent audio capture, voice-activity detection, and transcription that doesn't ship my speech to a third party every time I think out loud.

  • STT: faster-whisper running medium.en on CUDA via a Python sidecar, fronted by Silero VAD (ONNX) so the renderer only ships actual speech up to the transcriber.
  • TTS: Cartesia Sonic, streamed. Streaming is non-negotiable; nothing ruins the illusion like waiting for a full audio file before playback starts.

STT runs cold in about 80ms on my 4090. The presence orb transitions to listening the instant VAD trips, and to speaking the instant the first audio chunk arrives from Cartesia. Felt latency is the metric I optimize for, not first-token throughput.

Memory: a markdown vault

Memory is an Obsidian-backed folder under NOVA_VAULT_PATH, watched by chokidar. Conversations log to Conversations/YYYY-MM-DD/. Notes Nova edits show up in Obsidian within milliseconds. There's no DB schema for me to migrate — when I outgrow this, I'll layer SQLite-backed embeddings over the same files for semantic recall.

The principle: my data is on my disk in plain text in a folder I picked. Anything fancier has to earn it.

What this stack costs

The downside of every choice above is the same: surface area. Electron's child-process model is a footgun in production. Next.js static export disagrees with anything that wants server runtime. Vercel AI SDK has its own opinions about request bodies that don't match every provider. pnpm workspaces means I have to pay attention to peer-dep resolution. The Python sidecar is a separate venv to keep alive.

But every one of those costs is a known cost, and the project has the shape of a thing I can actually ship. Day one was a long week of pnpm install failures and spawn EINVALs. By the end of week two, the chat loop was streaming, voice was working, and skills were loadable. That's a stack picking itself well enough.

Next up: the brain swap layer. Why pluggable providers turned out to be cheaper than committing to one, and what I learned wiring up Anthropic, Google, Ollama, OpenRouter, and an unofficial Claude Max proxy in the same week.

Want this in real time?

Discussion happens in the Discord.

Join the Discord