I realized last week that we can now use WebGPU and WASM to run computation heavy work directly in the browser. I’ve tried using this to run smaller LLMs on a MacBook Air this way and performance is reasonable.
This got me thinking about building an app that does the following:
- Stores all your chats, documents, memory, projects, etc. locally
- Runs local inference if supported (GPU, M3, or similar hardware)
- Runs private inference otherwise (open-weights model hosted on a rented server)
- Runs anonymous inference against frontier models on request (via API, hiding user data)
In my head this would give us the best of both worlds. We keep personal data private by default and keep our options open to use “more powerful AI” if we need or want to.
I’m pretty certain this can run most day-to-day tasks like research (web search), brainstorming, document processing (draft an email, etc.) locally. Image generation will likely be too much for now, but that’s why we can “run anonymously against frontier models” if we need/want.
Right now I’m on this roller coaster between “this is so cool, I should build it right now” and “ollama is good enough and who wants this in a browser anyway”. That’s why I decided to post here and ask:
Is this worth having? or am I the only one excited about “zero-setup local-first LLM”.