Add AI Chat tools

Brave Leo is the best privacy for online provider if you use the open-weights models (Llama, Mistral)

Also please note that no online providers “protect your privacy”. They are merely implementing technical solutions to avoid issues (Brave) or pledging/contracting to not store your data, or doing so for a limited period of time (Duck, and Hugging Chat?)

Ultimately, nothing is encrypted so it’s roughly equivalent to using a private search engine. Hackers, LE, could still have access.

Nowadays, we are starting to see amazing small models who are great at most basic lexical tasks and which can be run on a high-end smartphone

2 Likes

Hi, I’m new here, but I’d like to share some of the stuff I know on this topic for those who may not be very familiar with. While I’m not familiar with using online providers, I can talk about locally run models on PC. (Incoming wall of text)

Currently, privacy is one of the biggest advantages of locally run models compared to online providers, as it is capable of running completely offline and thus, nothing needs to leave from your computer (unless you want to remotely access your machine which requires explicitly configuring to do so). This also means you don’t have to trust your inputs to any providers practices or privacy policy, which is great for individuals and companies alike that may want to use sensitive information in their inputs (such as documents) without worrying about them being read/leaked/any other possible bad thing that could theoretically happen with such information in an online provider.

One issue to run them locally, is having to choose between speed, intelligence/usability (not sure if it’s this is the correct term) and/or context length. To best explain this issue, I will give an example. Let’s say someone has a computer with 16GB of RAM. That person will be able to run smaller AI models, but generation will be done with the CPU, which is quite slow. Under normal circumstances they have three options:

(1). Run the best largest model and/or highest context length their system can run (For example, Llama3.1 8B at Q8), while accepting that generation speeds will be extremely slow, and potentially run out of memory.

(2). Use a smaller model and/or less context length so that generation speeds can be higher and more usable, while accepting that the model won’t be as good and/or that it may forget some stuff due to a shorter context length.

(3). Buy a GPU to upgrade the system, so generations are now run with it instead of CPU, greatly boosting speeds and, depending on how much VRAM it has, running larger models and/or higher context length.

Let’s assume in our example we buy a GPU, let’s say, a RTX 4060ti 16GB and install the necessary drivers. Now we can run at significantly more usable speeds while using the same previous model in the option (1), but now that we have dedicated VRAM into the mix, we have a new topic to discuss. When running a model that’s loaded entirely within GPU, you tend to get the fastest speeds for your system. But with tools like llamacpp, it’s possible to load models while splitting between VRAM and RAM, so our previous 16GB of RAM can also be used. This allows you to run even larger models than you could previously run in this system, but there’s a catch, having even just one layer of the model’s weight in RAM instead of VRAM already slows down generation speeds, and the more layers you put in RAM instead of VRAM, the slower generation times become.

So we get into our previous three options again, which makes this a good opportunity to talk about the actual issue I wanted to talk about. Cost. While it is theoretically possible to locally run something as large as Llama 3.1 405B (Which is literally hundreds of gigabytes large even at the lowest usable quantizations), you’d need to spend a LOT of money building a computer with enough RAM and/or VRAM to run such model, while not necessarily being better in performance than something like ChatGPT/Claude from what I’ve heard (never utilized them so IDK).

As for what backbends to use for locally run LLMs on PC (I’m not familiar with how things are in cellphones, but I’ve been told it’s possible), there are two big options that AFAIK, all backends right now are in one way or another a derivative. Llamacpp and Exllama2. The former allows you to split the model layers between VRAM and RAM, but requires models to be quantized into .gguf files. while the latter is slightly faster but needs to fully fit inside VRAM. I personally use the former using a fork called Koboldcpp, which has extra features built on top of it while also being one of the easier ones to install as the devs provide precompiled binaries for both Linux, Windows and Mac, although you can compile it from source if you want to. It’s not the fanciest in terms of UI, but it’s perfectly usable, and if you so desire, there are other frontends available out there that you can use while still using koboldcpp as the backend.

As for security, I think a good rule of thumb to kep in mind is TO NOT RUN PICKLE FILES. I don’t know from the top of my head the exact details, but it’s an old file format they have vulnerabilities that have been reported in the past, to the point that a new file format “.safetensors” was created for newer models that explicitly lacks said vulnerabilities. If you’re running llamacpp or one one of it’s derivatives, you will be looking for a “.gguf” models, or quantize the model’s original “.safetensors” file yourself.

There’s a lot more stuff to talk about, but I don’t want to make this post longer than it already is. Also, most of the information here is from me remembering from the top of my head, so I hope I haven’t missed/said incorrectly any important details.

2 Likes

Welcome to the forum! If you want this topic being approved, please vote (go to the top of the topic). Unfortunately the PG team doesn’t seem interested in approving this topic, while there is a fully ready PR.

I would like to either link to an article mentioning all details about running AI locally, or maybe we need to write our own. This could be in a blog post.

Is there any advantage to it compared to Llamacpp ?

I will complete my answer later today.

You can see the recommendations in a preview at https://2525--glowing-salamander-8d7127.netlify.app/en/ai-chatbots/

Absolutely!

I don’t exactly where to look for articles or blog posts, but one important thing to mention is that for GPU acceleration, not all gpu vendors are the same. This is because for the longest time, the AI field has depended on CUDA, which is a proprietary Nvidia thing, and while these days other graphics cards can be used, they are still FAR behind Nvidia and support for those non Nvidia cards will often require more steps to get it working, have problems nvidia users won’t have, lack compatibility for certain features or straight up not work at all.

Assuming you’re using a Nvidia GPU, upon installing the usual Nvidia drivers and cuda (I think you need to install it too, at least in fedora, but I’m not 100% sure), then all you need to do is install your selected application.

I will mostly talk about koboldcpp, but for other applications, you will almost always have to use the terminal to both install and run the backend, with the UI being in a localhost address accessible through a web browser of your choice. Said backed (or in koboldcpp’s case, it’s binary), I don’t recommend running it’s process in the background (as in, without a terminal window for it), since closing the your web browser won’t close the application due to their nature, and if you don’t have a terminal window open for them, you will either have to find it’s process in your task manager application or reboot your system.

Inside the webui, there should be sampler settings you can configure. Which I would link here, but apparently there’s a limit of two links per post, and there’s some other stuff which I think is important addressing first.

Some particular quirks I’ve encountered:

  • The wiki in the github of koboldcpp still has a good chunk of outdated information despite the project being very active. An example of this is mentioning TheBloke, which no longer uploads .gguf quants these days. Currently, this space has been filled by bartowski and mradermacher.
  • Tor Browser can’t access the localhost webui due to the whole onion circuit thing, so it’s best to use another browser for the webui of your AI application.
  • If using flatpak for your web browser. Don’t disable network permission (eg. in flatseal), since it also prevents the web browser from accessing the localhost webui for some reason. I currently don’t know an easy solution for this as I’m not experienced with firewalls.
  • I once found a thread on reddit about a user having issues in with images for llava/AI vision or AI images generated within koboldcpp’s webui not showing correctly, but they managed to fix this by disabling resistFingerprinting in about:config (enabled by default in Arkenfox, Librewolf and Mullvad). While I don’t think this is anything malicious from koboldcpp and it’s just some aspect of resistfingerprinting that breaks this functionality, I recommend using a separate web browser to use as a webui for AI applications if your threat model requires your main browser to have resist resistFingerprinting enabled and you want to utilize Llava or SD image generation within koboldcpp. I personally don’t use those features since I don’t have a use case for Llava, and the image generation is very basic since kobold focus is towards text generation, and while I could go on and talk about local AI image generation, it’s a whole new subject that I assume is a bit out of scope for this thread.

exllamav2 is a library supposed to have faster inference speeds, but from my experience, the difference is not that big compared to GGUF models (Like 1 to 2 t/s on my machine), and isn’t capable of splitting the model weight layers between RAM and VRAM like llamacpp and it’s derivatives. Still another option for those interested. Currently, I’m aware that tabbyAPI (it’s official backend) and oobabooga’s text gen webui suport it, although the former is backend only and needs a separate webui of your choice, while the latter uses gradio for it’s webui, which the oobabooga devs had to disable said webui’s analytics.

Might be worth mentioning koboldcpp supports llava too. Don’t have much to say about the others since I never utilized them before.

Quoting myself here to continue on this subject, I found some more info about the pickle subject, from both python and huggingface (I won’t link it due to the limit, but it’s a bit outdated anyway since unlike back then, models are all distributed as .safetensors or .gguf)

Also. KoboldCPP offers horde, which is a OPT-IN feature you have to explicitly configure, where you can volunteer your PC to host a model of your choice for other koboldcpp users to use freely. While it’s a nice idea in theory, for privacy reasons, I STRONGLY RECOMMEND AGAINST USING THE HORDE, since generations will be done on the volunteer’s PC and not yours. As for those who may want to volunteer, I’m not aware how it works and if there’s any settings you can enforce, but be warned that users from horde will use the model you host for inference in your machine with all kinds of different prompts, which I personally do not feel comfortable with.

Well, that took quite some time to write up. Assuming there’s no errors/wrong information, I hope this helps a little bit.

No one posted this one here yet or I missed?

3 Likes

I want to add Notebook LLM, which seems very promising. I have yet to test it, but I will in the coming week. With a powerful laptop, it meets most needs. They recently integrated ollama , so you don’t have to download models separately.

@Average_Joe

@anon80779245 already created a pr which explains it. But Leo is an excellent choice, just directly integrated on browser and require no account. If you need more capabilities, then you can use hugging chat which is getting better day by day.

Is there a way to integrate “private”/“anonymous” cloud solutions (duck, leo) in something like ollama or similar TUI interface for a private AI companion outside the web browser?

Also, has anyone looked into Mycroft? It seems cool and interesting since it can integrate with the OS and do stuff, albeit maybe a bit risky though I’m sure you can sandbox it to limit its capabilities.

Unmaintained unfortunately.
As for the two “spiritual sucessors” one needs a Rasberry PI, and the other is a CLI with difficul setup.

That’s why I didn’t put any* browser-based interface, it really ads a lot of complexity and create problems. To be fair, I tried using Ogabooga and it looked great, until it didn’t work. To be fair, I tried to make it running Molmo, which hasn’t any gguf files, just safe tensors.
(*llamafile is browser-based but it’s a one-step se

Lmstudio.ai agan, and its design is very neat. It now has .deb packages along .AppImage ones. Unfortunately, it is closed-source. Do you think it would be appropriate to have proprietary LLLms app ?

Also, GPT4ALL is really a powerfull app, and it even has a Flatpak. The issue is the Flatpak is community maintained, and maybe some in PG don’t like that. I would like feedback of GPTALL experience, esp. from those using distros recommended by PG. (I personally use Ubuntu).

1 Like

That would cut a huge chunk of great AI Tools and frontends, which almost always use webuis, and would also limit PG in the future if it ever includes non LLM AI Tools. Also PG includes recommendations for things like Arch Linux and other more advanced self hosted stuff, so why not include here too?

I can’t speak for everyone or PG, but when many of the AI tools out there are open-source, being proprietary is kind of a big disadvantage and would need to have a really nice feature other tools lack IMO.

Haven’t heard or used it, and nothing against it either but, assuming flatpak is the preferred option, would different distros make any difference here?

I never said, nor intended this to be a criteria. I just said my current opinion.

The issue is the flatpaks community-run, which for me is fine, but I know that Privacy Guides usually doesn’t like when the package doesn’t come directly from the developper. So it would need to be from the releases page. The issue is that officially, it only supports Ubuntu, see [here](gpt4all/gpt4all-chat/system_requirements.md at 62f90ff7d5e9ed7796d3c3b761f68d1a1b49ad1a · nomic-ai/gpt4all · GitHub

I mean, why not include something like Oogabooga. But someone would have to actually test it, and write a PR to add it.

While it shows as unverified in Flathub, there seems to be a merged pull request to add the flatpak manifest in the gpt4all repo, although I’m not sure if this means anything or not.

It says that it is community-maintained, but I am not sure whether that means they aren’t doing QA for it, or else.

See GitHub - nomic-ai/gpt4all: GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

Hello everyone.
Thank you for all of your inputs.
First, do check
the preview, and let me know what could be improved.

Secondly, I am still waiting for @team to approve the PR, although @rollsicecream has helped a bit. Ultimately, I also need @jonah to approve the PR.

I hope this can move forward quickly.

Have a nice day.

2 Likes

Thanks for putting a PR together! The recommendations seems solid.

However, please take care to not anthropomorphize LLMs and feed the big tech AGI narratives. Use “interact with LLMs” instead of “talk to”. Also the intro text could include what LLMs are (statistical tolken prediction machines), that they are extremely computationally expensive, inherently biased and are trained (lossy compression) by scraping large swaths of the public facing internet without consent, attribution or compensation.

3 Likes

Done

I just added the ethical concerns of LLMs in a info box at the begining.

2 Likes

The privacy purpose of free open source is that code is open to audit. In other words, you can confirm there is no telemetry phoning home, there isn’t a secret keylogger, there isn’t something malicious that modifies system files, etc.

Since we can’t audit the systems or their training regime, we shouldn’t call them opensource, I guess we agree on that? The OpenSource org has released their definition: Open Source AI – Open Source Initiative, not sure how much credibility they wield in defining terms, but I guess something is better than nothing?

I am not saying that we should boycott chatbots, they are a tool, whether I like it or not, they will be used for better or worse. I was just pointing out the obvious pitfalls that deserved some mention when privacyguides recommend the systems. Since the companies fail to mention these, atleast they should be mentioned in a recommendation by PG.

IMHO, the quality argument is kinda important because of how people anthropomorphize these stochastic parrots (Source). All these resources mention that these models can produce wrong answer but that’s about it. There are no guardrails in place for people, and the companies have no incentives to put this information in bold letters on their webpages, because they know that this generates more hype and in turn more value for the company.

Yes, we agree. That’s why the pagerefers to open-weights, not open-source.

Look, we can’t talk about everything in a tool page. At some point, we should write a complete guide, but for now all we have (if @team decides to care) is a tool page. I can mention hallucinations, but that would just be in an info box. We aren’t here to educate people about AI in general..

We can’t audit the binary blobs in the Linux kernel either, but we still consider it open source. Not to be pedantic, but there are different levels of what to consider “open source.”

Consider this example:

LLM Chatbot #1 is entirely proprietary. A user installs it on their computer, it shares 100% of the user’s inputs with its home company, greatly compromising privacy.

LLM Chatbot #2 is a proprietary install but uses an “open” model like llama. Still, the proprietary client installs telemetry, a keylogger, malicious code, etc. This also compromises privacy and security.

LLM Chatbot #3 is an open source client that uses an “open” model. The training method is a black box, but the client can be audited to prove that it is not sharing any data with a home company, there are no keyloggers, malicious code, etc. It is unknown exactly how it produces its outputs, but the user can make the decision whether to trust the outputs or how to use it.

LLM Chatbot #4 is an open source client that uses a fully open model with available training data, weights, methods, etc. This would obviously be ideal if it could be done well, but it doesn’t happen often and it could be pretty hard to make it happen.

In my opinion, PG ought to consider both type 3 and type 4 LLMs since it is far better to move from type 1 or 2 to type 3 than to refrain from doing anything. A type 3 chat succeeds at protecting user privacy and data, and protection from malware, whereas type 4 just helps highly specialized power users better understand their results and able to trust the content of their outputs.

2 Likes

Do we have any update about the PR ? It seems to be quite solid.

We can’t audit the binary blobs in the Linux kernel either, but we still consider it open source.

I’m not well versed with any current development in Linux kernel, but a binary blob in Linux kernel which is not auditable seems kinda wrong. I will look it up when I can.

About different kinds of chatbots, I’d repeat the most generic thing that I say: Applying a very specific yet borderline unacceptable terminologies from software engineering to chatbots do not make sense because It’s not as monolithic as these terms used in your example are trying to imply. I do not mind closed-source/proprietary tolls getting recommended for chatbots, because the reality is that it is the hottest thing in the market. Keeping it closed source gives you competetive advantage when it’s in itself is a hundred billion $ industry. Calling these models opensource is where the problem lies. Open weight is not open source; when there is distinction, why not use the correct terminology?