Ollama and alpaca is a stll good option when compared with copilot and stuff.
I think we should stop calling Meta’s LLMs (i.e. LLaMa) an opensource LLM. It’s practically akin to saying Android (not AOSP) is open-source. Meta has released the weights of the model, and you are allowed to finetune it/use it commercially etc. But it is not Open Source. Atleast, LLM researchers and noobs like me are not gonna call it opensource. It will be opensource when we can see the training data and code for the model.
A new (truly) open model has been released. AI2's Molmo shows open source can meet, and beat, closed multimodal models | TechCrunch
On a other note, @dngray could you review the PR? It is basically ready now.
IMHO, we should not add anything AI related. As someone working in the field, here are my two cents on why:
- the landscape is so evolving that we wouldn’t know what to recommend
- most of them are secretive about their RAG, continuous learning, caching of interaction
- another big reason is, THESE MODELS HALLUCINATE, A LOT.
- WE SIMPLY DO NOT KNOW IF THE DATA THAT THEY TRAINED THOSE BOTS ON WEREN’T SCRAPED OFF OF PEOPLE’S CONVERSATIONS.
We can’t talk about privacy and then do a 180° turn and recommend LlaMa which might have used data which may include but is not limited to all of Australian Facebook User’s public Facebook photos, including children if they were posted by adults. Did they also use the messages from the era when messages were non E2EE? Google Research, FAIR, MSAI, no one says anything about their data or how it was collected, nor does OpenAI. Recommending chatbots on this platform is like: Hey yeaaah, these companies might have harvested all your data to train these models, but just because they are behind this blackbox of a model, we don’t think it’s a privacy violation or anything of that sort.
There is also a worrying trend of using chatbots as personal companion or venting, or asking mental health question is through the roof, especially amongst young people, who may not really grasp the intricate details about hallucination, data leakage, contamination amongst other things that goes in these models.
Forgot to tag @jonah
But the point of PG is to help users preserve their own privacy while using modern tools with workable systems.
Whatever limitations of AI, people can judge for themselves if they are useful, but they need assistance determining how to use them in privacy preserving ways.
Clearly using ollama locally is better for privacy than logging into OpenAI and using GPT or using Microsoft Copilot with your account.
Again, the point isn’t just about preserving own privacy while using tools. IMHO, it’s kinda polar opposite to promote preserving privacy while making a guide to use the tools created with gross violation of user privacy.
IMHO, we don’t need to have a guide for everything just because it’s the shiny new thing now.
It’s like owning the newest Nike kicks/iPhone and preaching about sweatshops in Bangladesh/China.
Whatever extent privacy was violated to make the models for ollama, it has already happened. Using the tool isn’t changing that.
The damage is already done and lawsuits are underway. We can’t go back. In this context, pg should recommend what is the best among what is available.
You are right. But users gonna use AI, anyway. It will be good to offer them some alternatives which are less likely to invade their privacy. If we will talk about externalities, we have a guide on MacOS and iOS, so will we consider the kid workers on batteries, exploited labor in China etc. I know what you mean, you are right. But practically it’s not a good idea not to say anything about AI if you wanna become an authoritative source on privacy.
So, being closed source, potentially trained on private data, unverifiable claims, repeated accusations of using test set already in the training for inflated results is not a dealbreaker for not to have an official guide on them or just say “Hey we are waiting this one out until we get some clarification from the devs/companies making them”?
@benm I agree with some of what you said, although it hasn’t already happened. Ollama is just an interface to connect to actual models like “Gemini, LlaMa”, amongst others. They will always release a new version of it because that’s how fast landscape changes in NLP, they will keep changing their policies without giving a notice. So, it hasn’t already happened, it is happening and continue going to happen.
@win11.shading291 Lawsuits, iirc, are for infringing copyrights of bigshots like NYT, SMG, I don’t recall anything about usage of user data from the platform, because we don’t know what it was trained on.
P.S: I am a bit biased about all of these, because researchers from my ex-affiliation’s department are the ones who have published some details of how these models are “cheating” on their evaluation by including that test set in their training.
AI is a tool, and I understand the concerns. However, we generally don’t evaluate ethical concerns. As you said, phones are made with exploitative processes, yet we recommend to buy one, is the Google Pixel.
For the license, Meta has never changed their license for the same model. They did however make their Llama 3.1 license free, by allowing it to be used for AI training.
People are free to use the model they want, with varying degree of openness. But open-weights models can’t ne defined as closed source.
The training on private data bit, yes it is true. But on the other hand, anything not private is public. That what you said on forums or other privacy-invading platforms get sold, used, etc is nothing new.
Unless they are reliable data extraction methods, I don’t see this as a big enough concern to not list them.
For the hallucinations, feel free to add a warning in the GitHub PR.
I agree, but nonetheless, if this topic is avoided, users will be left uninformed and most people will simply use what’s available.
Some literally using your personal data to train their crappy AI models. I’lll use an example from Microsoft’s privacy policy:
As part of our efforts to improve and develop our products, we may use your data to develop and train our AI models.
This is just one of many examples.
Off-topic(ish)
Honestly, I can’t see how anyone can bare to use Copilot, especially with its crappy responses and weird hallucinations that are somewhat amusing yet horrid.
Open-weight models are closed source models. The closest I can think this in terms of software is: These are freeware models with option to add your own plugin (i.e. fine-tune).
Unless you can simple copy the data and code to reproduce the results that they got, it’s not opensource. Can’t believe people here are falling for FAIR’s bullshit marketing crap.
But on the other hand, anything not private is public.
I would consider that to be a very wild take, especially when it’s on this website.
Unless they are reliable data extraction methods
There is, it’s just expensive and would take time, making them lose their edge in a cutthroat field.
The privacy purpose of free open source is that code is open to audit. In other words, you can confirm there is no telemetry phoning home, there isn’t a secret keylogger, there isn’t something malicious that modifies system files, etc.
While there are plenty of other reasons one might want FOSS, these are the reasons related to privacy. Regardless of whether you categorize AI as “FOSS” because of details like the weights, the training data, etc. none of the privacy concerns normally associated with proprietary software apply.
Ultimately the main objections are:
- The boycott argument - these models are doing bad things to get their training data or cheating on benchmarks so we should not use them because we have a duty to boycott
- The quality argument - these models produce hallucinations or other results that are not good.
Personally, I think they boycott argument doesn’t work since PG already recommends tools for privacy violating platforms anyway. Whether you boycott is a personal decision and not everyone is an activist, and it’s not even clear how boycotting would stop or slow down the continued practices you oppose. More likely it just leaves people behind in an emerging technology.
The quality argument also does not work because people can decide for themselves if there is a high enough quality for their use case.
There is another fully open model from amd
@brivacy yeah molmo looks great.
I am bit discouraged by the fact the Privacy Guides seem to completely be not interested in approving the PR, even though this a popu
I appreciate your reply!
Is there an AI tool that’s connected to the Internet that protects user privacy?
I’ve been using Brave’s “Leo AI” since Brave has a great reputation in protecting user privacy.
duck.ai or hugging chat as discussed above.
Edit: forgot to add link.
I appreciate your reply!
How does Brave’s Leo AI compare for user privacy?