I want to build an AI/LLM app that preserves privacy. Please roast my idea :)

I know that some of you are as deep into privacy as I am into tech so I am hoping to get some different perspectives than just my own.

Background is that we like local AI because conversations are guaranteed to stay private, but hardware (the “power” of your device) is a real limitation.

What’s new is that technology has advanced quite a bit in the last 4-6 months and I can now run a good enough LLM locally on my phone to chat with docs, draft emails, summarize websites, etc. Ditto for consumer-grade laptops; just better quality. For image generation, it works on a consumer-grade laptop; haven’t tested anything on phones yet.

The other cool recent-ish development (I think) is that we can now build it as a browser app. You don’t need to install anything manually (zero setup) which makes it easy for less technical people. You’d also benefit from sandboxing/security provided by modern browsers.

Combined, these two things seem useful enough as a technology that I am thinking of making this more widely accessible.


So here’s my crazy idea: Build an AI chat app that runs locally on your phone, tablet, or laptop and that has all the features of “commercial-grade” AI apps like ChatGPT or Claude but (1) stores all your data locally, (2) runs the compute locally, (3) only has access to files you actually give it access to, and (4) is open-source so people can inspect what it’s doing.

Another thought that crossed my mind is to have an opt-in fallback option for very low-end phones or really demanding tasks to do compute remotely. This would mean running the inference on private servers with (1) approval that this should happen, (2) no logs/storage/training, (3) masking of any PII before sending the request, and (4) open-source.

In addition to the above, what would such an app need to have for you to consider giving it a try? (Also, why is this a stupid idea and I should be ashamed of myself for even suggesting it :D)

3 Likes

No offense but this is a very bold claim, usually LLM refers to the models with over 100B params, so basically what edge devices can run is smaller SLMs (usually below 4B). And usually the context length and max token size has to be squeezed to preserve RAM.

Chatting with documents e.g. RAG takes up plenty of processing power as well and usually SLM struggles to extract and retain sufficient details to answer questions properly.

I don’t do paperwork a lot when I am on the go, so not sure about other use case.

If I must use chatbot on my edge device, I would use Proton Lumo. Otherwise I wait until I get home and fire up my GPT-OSS-20B on my PC with 32GB ram.

That sounds KAI chat / Duck AI, I am not against the idea, but there’s not much a use case for me.

4 Likes

you can try something like off-grid mobile AI. The app seems pretty capable. But I tried using Qwen 3.5 2B Q4, which loads and responds relatively quickly on a Pixel 8. But I’d say is far from useful still. I asked it a very basic question and it said a bunch of nonsense.

2 Likes

I do find Qwen 3.5 is quite…… weird, compared to Qwen 3, qwen 3 was pretty good.

2 Likes

This can be built in a matter of days in reality. Given the low cost barrier to entry, are you sure this hasn’t been done before? In fact if all you’re doing is just acting as a web wrapper around an existing SLM, I’m 99% sure it must already exist.

As for remote computing, that’s essentially opening yourself to tons of compute cost for other people. Your service will be abused. If you plan on charging for this, then at that point I’d rather pay for a state of the art model from Anthropic then for a poor performing SLM on a remote server. Otherwise Lumo from Proton somewhat meets this criteria already, just not as FOSS, and I get it for free as part of my subscription to Proton.

If you are saying we as in a business partner, do market research. Is there a demand for people wanting SLMs on remote machines that also aren’t apt enough to self host it or simply utilize existing services?

If it’s we as in PG, I’d say there is likely already an existing good enough FOSS solution. If you want to make a new one, ask yourself if you can meaningfully contribute to an existing project or ask yourself if what you have is truly a differentiator.

1 Like

really? I self host the 122B Q4 and I really like it. I prefer it over any previous LLM I’ve used. It’s the first one I’ve liked more than gpt-oss-120. I find Qwen 3.5 27B and 35B quite good too. My favourite family of self-hostable models so far.

1 Like

Interesting perspectives; thank you :slight_smile:

I’ve always thought of SLMs vs LLMs as a marketing distinction. They use the same self-attention transformers under the hood and GPT-2 at 1.5B was once considered a LLM. Today, MoE blurs that line even more. Qwen 3.5 has up to 387B parameters but only 17B are active per token. Still a good call-out @TinFoilHat; SLM vs LLM does make talking about it cleaner.

One key thing about SLMs that (I think) a lot of people miss is that they are not meant to be used as a pocket encyclopedia. Their latent knowledge (memorized facts) is terrible which makes them hallucinate a lot. Below 30B-ish context engineering and grounding matter a lot and a thin chat wrapper around a SLM is set up to fail. SLM+search or SLM+KB, however, may surprise you; I just haven’t seen apps dedicated to this yet.

you can try something like off-grid mobile AI. The app seems pretty capable.

Cool! I haven’t seen this one yet. I am aware of Jan.ai, Ollama, and various ways of talking to something hosted in your local network, but no mobile apps yet. I’ll check it out.

Given the low cost barrier to entry, are you sure this hasn’t been done before?

@overdrawn98901 Right? You would assume it already exists. If it does and it works well, why isn’t it more popular and the go-to recommendation here on PG?

I’d rather pay for a state of the art model from Anthropic then for a poor performing SLM on a remote server.

Oh, no no, if we are talking servers then we are talking competitive open-weights LLMs (Kimi, Juggernaut, Qwen, …). Stuff that wouldn’t run on your computer even if you had a decent gaming GPU.

And yes, like most server-side offers that part would have to be paid. I, unfortunately, don’t have the money to run as a charity. I do have the skills to set up ops, infra, code, etc. I’m just not sure what’s important beyond OSS and no-logs to show that I am indeed respecting you privacy.

1 Like

That’s exactly what some users (myself included) are achieving with Qubes OS.
You run Ollama with the required models in an offline qube. For more compute power, it’s advisable to pass through the notebook/PC GPU to the “ollama qube”.

Files can only be exchanged between other qubes (with internet access) and the offline Ollama qube using Qubes’ own qrexec agent. This ensures that no data can leave the Ollama qube unintentionally.

The web interface is optional, Open WebUI is a good candidate.

Here are some links about Qubes OS in general and about the setup:

1 Like

The true silver bullet is E2EE, where the two ends are the user and the model, server is blind to the operations. The perfect world scenario is FHE, fully homomorphic encryption, but that’s a holy grail. There may be other cryptography systems to tackle specific threat models, but none easy outside of a skilled cryptographer. I’ve got rough ideas, but I would definitely botch it if I even think about it.

Other than that, AGPL licensed server code is the way to elicit trust that you are not logging. Otherwise it’s brand trust snd pinky promises.

3 Likes

Tokenizing on the client + GPU servers with a TEE (Trusted Execution Environment)? If you trust TEE tech then that’s E2EE with no ability to introspect the content server-side. (The “tiny” problem of getting ML libraries to work inside a confidential compute environment is non-trivial, but I will assume it’s solvable until proven otherwise.)

Downside is that GPUs with TEE are rather pricey. I’d guess users would pay 50-100 USD/mo for competitive models if you run as a non-profit … until there is enough scale to buy physical hardware and then the entire dynamic shifts. Would that be too crazy an ask for fully private conversations with a model that’s Opus 4.5 / GPT 5.2 competitive? :thinking:

Also is encryption/security all that matters for privacy? (Feels too simple.)

That’s exactly what some users (myself included) are achieving with Qubes OS.

Interesting stuff @ls.skuggi It’s a solution for isolating a local app from the rest of your data, right? Or am I missing a piece here.

If cryptography is a word that elicits the feeling of ease, then I would say you underestimate the sheer amount of foot guns .

Cryptography makes a data security / privacy problem a key management problem at face value.

This is indeed a possible route to take, but I would emphasize the non trivial and the cost of this. You’d need some dough to make that scalable. You’d also need a way to ensure users trust they are executing in a TEE and not a normal processor.

1 Like

No, you misunderstand. I don’t think it’s easy. I’m just surprised that there aren’t a million other things that go into building for privacy beyond “just” doing security right.

1 Like

E2EE mitigates most privacy concerns well when both ends are trusted.

You could say E2EE is applies for HTTPS, but the other point is the server, which people don’t trust. E2EE so the server isn’t aware of the data requires much more diligence and care.

1 Like

Have you looked at https://anythingllm.com/?

The phone app uses SLM and can sync to a hosted version on a laptop or server that runs whatever CPU/gpu is available there. Has a web server. MIT licensed… as open as it gets.

1 Like

Open for developers, not for end users.

AGPL is currently as libre as it gets.

Libre lol there’s tradeoffs either way. There is some value to copyleft licensing but it can also be used in a hostile ways that hurts users. Likewise you have to have a clear strategy with permissive licensing. But to build large ecosystems that don’t keep code beholden to their creators and assume they too won’t be human and exploit those who lock into their ecosystem, permissive is for more libre in that you aren’t beholden to anything except not suing people for using their free software.

More details here

1 Like

I forgot you already responded to this :slight_smile:

My later response to another user was this and I think this summarizes my position well

1 Like

I’m not sure how much the choice of license matters for privacy beyond the ability to inspect the code. I also still don’t understand how to show that the code that is released is indeed the code that runs.

Providers showing A but running B or running A+(stuff) seems like a much bigger concern to me than the debate between GPL-family vs permissive licenses like MIT or BSD.

If you read the thread I linked I explain how licensing affects users in an ecosystem. Licensing can block or make way to who makes decisions around the technical steering of the project. If the wrong people get in charge of the original developers become split on how to monetize the code, you may continue using an application that is privacy respecting only to have the rug pulled from underneath you.

The licensing and dependency structure alone can’t tell you entirely if the project is vulnerable to a shift away from being privacy respecting or not, but it is a signal alongside the perceptions of the primary developers, how easy it is for new contributions to come in, etc.. Etc… if more people are involved and understand the code base and the licensing is permissive, this signals that anyone can split and fork if a company tries to monetize a fork of it. It’s basically a power check by pluralistic ownership.

Au contraire, AGPL is used to scare away businesses that want to use software for free (and this is not bad… or good… but it is something). For consumers of the software, AGPL is mostly the best possible copyleft license, imo as well.

Not really. License has nothing to do with “eliciting trust” on deployment side. You’re probably thinking of “Remote attestations”.[1][2]

I mean, the choice of (source code) license matters in so far it does not prohibit code inspection (which none of the OSS-approved or FSF-blessed licenses do). Distribution license (which may be different to source code license like in the case of RHEL & VSCode, to give examples) matters less for FOSS…


Edit

If you (and your team, if you’ve got one) can afford enough time for such a project, and do have your own unique ideas to bring to the table, go ahead and build it! Literally no one (who is also building the “same” thing) is probably thinking of the feature set the way you might be / ought to, who knows.


  1. Secure encryption and online anonymity are now at risk in Switzerland - #30 by ignoramous ↩︎

  2. Trust assumptions in none-reproducible FOSS applications - #10 by ignoramous. ↩︎