Ente made a bunch of projects as part of its hackathon (most were fun one time use), but one which stood out was its local LLM chat - https://ensu.ente.io/
This basically downloads a llama model upon visit, making it easier for anyone to have on device AI chat.
Itās super cool that WebGPU exists, as I didnāt know about it, but itās not very practical for most users. The ensu ente guys were nice enough to leave their app source unminified, so I can see exactly what theyāre doing and it was not meant to be anything serious. I can see that they quickly vibe coded the site and itās a very simple wrapper around: GitHub - mlc-ai/web-llm: High-performance In-browser LLM Inference Engine I see the web-llm guys have a much nicer demo site built out with their lib here: https://chat.webllm.ai/
I think the problem with WebGPU-based apps like this is that your handheld devices are too weak for local inference and if youāre on a desktop you may as well be using a more integrated native tool which will have advantages. I know this because I created my own local LLM desktop app ( Gerbil ). What I personally do is run an LLM locally through my app with the āRemote Tunnelā (via cloudflared) setting enabled and then I can chat with it when Iām away.