Add AI Chat tools

First of all, we can compare it to Brave’s way of doing things. Just like DDG, Brave doesn’t log IP adresses and other Identifiers. But unlike DuckDuckGo, they don’t store chats at all. Basically just think about it. Why, if they don’t need oud data to train the models, would providers such as open AI and Anthropic keep the data for 30 days. It doesn’t make any sense. In some way, when you’ve promised to not train and retain the data for xx days, this is just a pinky promise. Furthermore,
Duck changed the policy. They now state that the providers can always keep the data for safety&legal whatever that means.

AI chat tool are basically like search engine. Just think how crazy it will be if DDG or their third-party provider said opening that they will keep ALL the search logs data for 30 days and maybe longer?

No, I do agree that Duck is providing something that no one else does, but at the very least, I would need to see the marketing change because currently they brand the chatbot as anonymous, while clearly with all the policies, data rentention not anonymous at all. So, I wouldn’t feel comfortable recommending them.

Below are answers from Duck Public Relations :

Hi there – thanks again for reaching out. Here’s an FAQ about AI Chat in case it’s helpful. Sharing responses below:

1/ What is the agreement between Duck and together.ai

  • Similar to other model providers, our agreement allows us to use the Llama and Mistral APIs anonymously and stipulates that no chats can be used for model training, etc.

2/ For how long (precisely, please) are the chats retained, and for what use ?

  • From the FAQ:
    • DuckDuckGo AI Chat does not record or store any of your chats, and your conversations are not used to train chat models by DuckDuckGo or the underlying model providers (for example, Open AI and Anthropic).
    • All metadata that contains personal information (for example, your IP address) is completely removed before prompting the model provider. This means chats to Anthropic, OpenAI, and together.ai (which hosts Meta Llama 3 and Mistral on their servers) appear as though they are coming from DuckDuckGo rather than individual users. This also means if you submit personal information in your chats, no one, including DuckDuckGo and the model providers, can tell whether it was you personally submitting the prompts or someone else.
    • In addition, we have agreements in place with all model providers that further limit how they can use data from these anonymous chats, including the requirement that they delete all information received once it is no longer necessary to provide responses (at most within 30 days with limited exceptions for safety and legal compliance).
    • You can read DuckDuckGo AI Chat Privacy Policy and Terms of Use here.

3/ Why is there an exception for the storage duration of chats “for safety and legal compliance”?

  • This is a compliance requirement passed through by the model providers, however none of the conversations can be tied back to any individual user as we do not send any identifiable metadata to our model providers.

4/ Does the delete conversation button actually delete it from servers of the provider (together.ai, Anthropic, OpenAI) ?

  • The action is called Clear Conversation a.k.a. using the Fire Button. It clears the conversation on the client side. We already don’t store chats on the server side, so there is nothing to delete on our side, and same for Open AI, together.ai, and Anthropic, as it is deleted except if there is a compliance issue as described above. Again, to be clear though, none of the conversations can be tied back to any particular users as we do not send any identifiable metadata to our model providers.
1 Like

I don’t know what their reasons are, but I can think of some potential reasons other than training the models (also in the case of together.ai, afaik they don’t make any models of their own to train).

My assumption is the 30 day thing is for some combination of: anti spam/abuse policies, assisting in troubleshooting, possibly some CYA policy.

Sure OpenAI and Anthropic could both be blatantly outright lying, but why would they, 99.9% of their users are going to be using their services directly, why would they need that last 0.1% of training data at the risk of lawsuit and scandal.

this is just a pinky promise

Agreed, it is definitely just a pinky promise (but a no log policy would be no different, we must inevitably trust both Brave and DDG, and by extension the upstream providers (if you choose a model hosted upstream). This is true for search engines as well.

I don’t see this as a dealbreaker, but this is the primary reason offline models are categorically more private (even in the case of a model hosted by Brave, it still falls into the ‘trust us bro’ category, what Brave or DDG do or do not do with or data once it hits their servers is not controllable or knowable by us, and that is even more true of their upstreams (when an upstream is used).

Duck changed the policy. They now state that the providers can always keep the data for safety&legal whatever that means.

A little too vague for my comfort, but that also doesn’t seem entirely unreasonable from the service providers pov, considering that people are constantly probing these models trying to get them to do things they aren’t supposed to do or say.

Does Brave not have that clause in their T&C’s for OpenAI or Anthropic?

Just think how crazy it will be if DDG or their third-party provider said opening that they will keep ALL the search logs data for 30 days and maybe longer?

I’ve always assumed that they (upstream Bing, or in the case of startpage upstream Google) probably do store queries (or data derived from queries) for some amount of time, possibly indefinitely.

And DDG is not saying they keep logs of convos, I think they are explicitly saying that they don’t.


I’m not satisfied with their answer to question 4/ , it doesn’t quite make sense to me.

specifically this bit

so there is nothing to delete on our side, and same for Open AI, together.ai, and Anthropic, as it is deleted except if there is a compliance issue as described above

They imply but never explicitly guarantee that the upstreams don’t store data except if there is a “safety or compliance issue” but as best I understand the <30 days policy is a general policy, not just safety and compliance. If the upstreams only store data in cases of safety & compliance issues, then DDG could and should make stronger guarantees in their privacy policy, if not, they shouldn’t be implying that the upstreams will immediately discard data. (unless I’m misinterpreting something).

Followup after looking at Anthropic privacy policy:

This may be a case where DDG is just being more explicit and transparent than Brave has been. Brave’s been somewhat resistant to clarifying language or a disclaimer about the Anthropic models

If we look at Anthropic’s Privacy Policy it says more or less the same as what DDG told you:

  • For all products, we automatically delete inputs and outputs on the backend within 30 days of receipt or generation, except when you and we have agreed otherwise, if we need to retain them for longer to enforce our Usage Policy (UP), or comply with the law.

  • For all products, we retain inputs and outputs for up to 2 years and trust and safety classification scores for up to 7 years if you submit a prompt that is flagged by our trust and safety classifiers as violating our UP.

I’m curious if Brave were directly pressed to answer explicitly, if they’d also acknowledge this caveat for “trust and safety”


Side note, it appears that Anthropic does have an option for a no-retention policy, but apparently neither Brave nor DDG qualified (or requested it?):

If you are a business or enterprise customer, we may explain our data retention periods in your services agreement with us, as applicable, and if you have been approved for zero retention

Well, maybe to detect if some entities are using it for AI training or something? But 30 days is a lot, @forwardemail which as a mail provider-- know a thing or two about fighting spam–only retains spam related info (I can’t rememebr exactly what info) for 14 days.

Not sure I understand, troubleshooting a bad answer from AI ?

While we don’t have numbers, my guess is that it is way more than 0.1%. For example many people use Poe. Now if Duck has a special agreement with OpenAI, then this could be different. But the API has a no-training policy, so I am not sure they have a special contract.

I do trust both companies, but not OpenAI which has lied in the past. They claim you can now acess ChatGPT without an account, while this doesn’t work for many people. (Even without a vpn, I couldn’t use without account).

Agreed.

Yes, but it’s very difficult to lie about not keeping data, while once you keep the data for x days, all sort of things can go wrong.

Brave terms states that :

  • Immediate discarding of responses: Conversations are not persisted on Brave’s servers. We do not collect identifiers that can be linked to you (such as IP address). Responses generated with Brave-hosted models are discarded after they’re generated, and not used for model training; no personal data is retained by Brave-hosted AI models.
    • Note that some non-Brave hosted models (such as Anthropic) will have different data retention policies. If you select a model from Anthropic and submit Leo queries, that data will be processed by Anthropic, and retained on Anthropic’s servers for 30 days. Anthropic acts on our behalf as a data processor for any personal data submitted, but does not use any personal data for its own purposes or to train its AI model. Anthropic are also not allowed to share query inputs and outputs, or link them to Brave.

It seems that Brave has a special agreement with Anthropic, but maybe not. But since you can use other models, it doesn’t really matter.
BTW, there is a warning on the PR about using Claude.

Probably limited to companies handling sensitive information (basically any big company). But Anthropic wouldn’t allow any customer-facing tool to have this right.

This doesn’t apply to Brave end users. Only Brave terms can apply to brave users, unless specified otherwise.

BTW Brave does warn that “You should not submit sensitive or private information in Leo” which I really like.

I legitimately think PGs shouldn’t wade into this until the tech is more settled, use cases are clear, and the ethical concerns have been resolved.

3 Likes

Good news ! Duck confirmed that they have disabled the chat history when using together.ai. I think we can safely add them back, although with some warning about not using close-source models and that open-weights model aren’t hosted by Duck.

New ansfer from Duck’s PR :

Thanks, again for your follow-up. Below, I’ve provided responses:

1/ Ah sorry about the misunderstanding — yes, we have disabled chat history by turning on the option to “not store prompts and responses” on the together.ai platform.

2/ As outlined in the Privacy Policy, we have agreements in place with all model providers that further limit how long they can retain data from these anonymous chats. They are required to delete all information received once it is no longer necessary to provide responses. Most of the time this happens immediately, but at most within 30 days (with limited exceptions for safety and legal compliance).

3/ The model provider doesn’t know who enters the queries, making them anonymous [as stated in the Privacy Policy: “If you submit personal information in your Prompts, it may be reproduced in the Outputs, but no one can tell (including us and the underlying model providers) whether it was you personally submitting the Prompts or someone else.”]. It’s also important to note we don’t allow documents.

Thanks, again.

4 Likes

What are the AI tools that don’t need to be connected to the Internet and can just run off a person’s PC 100% locally?

Can these AI tools that are run locally compete with the AI tools that need to be connected to the Internet 100% of the time?

It seems like asking an AI that’s connected to the Internet your most personal questions would potentially, be a huge privacy breach?

Modern AI is too new for people to be declaring it useless, or with reductionistic proposals about the ethics, or being nonsense generators or whatever. AI is a big deal, period, whether we like it or not, it is just the reality that its going to be here for a long time and it will change the landscape. Given the hazards for privacy, its worth thinking about it now, and not dismissing it as something with no use.

Ethics are important to consider, but the internet in large part does not care. Personally I find all pornography unethical but it does not change the fact that everyone who is on the internet is impacted by it being there. No matter how unethical people consider AI, it’s going to remain.

I can think of a couple dozen applications of AI that are being worked on now that are not consumer grade yet but likely will be.

  • I have a folder full of pdfs, let me ask some questions using those pdfs as sources
  • I have a folder full of images, move every meme into a dedicated folder, move every photo into another, move every screenshot into another. Now find me the screenshot where So-and-So said something about X.
  • Recommend some libraries that has x features and compare the advantages and disadvantages of using it
  • generate images following a certain set of style guidelines to use in my articles
  • Explain why I made this error on this math problem
  • Here is a photo of my plate, estimate the quantities of food, the calories, the protein and log it in my fitness app.

I can think of many more such examples and the only limitations is the tools being refined and made more user accessible. That’s happening now, so we should be thinking about how to keep this from being more dystopian than it has to be.

1 Like

Those are ollama, Kobold.cpp, llamafile

Look at the PR for details.

Not really. They are slower. And while the best open models are not that far from proprietary best-sellers ( https://leaderboard.lmsys.org is a website where users can submit a prompt to two random models and blindly choose the best. With all those results they have a leaderboard of the best models.) your machine will not be capable of running the >70B models.

If you don’t submit PII that’s fine. Please look into the PR on Gituhb, where there is more explanation.

1 Like

Some relevant (and positive) news today from Mozilla. PG Forum Post | Nightly Blog

Nightly can be configured by advanced testers to use custom prompts and any compatible chatbot, such as llamafile, which runs on-device open models, including open-source ones. We are excited for the community to share interesting prompts, chatbots, and models as we make this a better user experience. We are also looking at how we can provide an easy-to-set-up option for a private, fully local chatbot as an alternative to using third-party providers.

I’ve currently got Firefox Nightly setup to use Open-webui+Ollama behind the scenes, its really easy to setup.

Apart from opting-in to the experiment, and using Firefox Nightly, there are few about:config settings that need to be changed:

Note: the last one is particular to my setup depending on your setup you might use a different port or something other than localhost.

OTOH - This could be an opportunity for the PG team members to join other pioneers of the conversation when it comes to privacy.

I don’t particularly want PGs to be a pioneer at anything. I want it to fairly conservatively recommend things that are likely to be stable. Otherwise you get embarassing climbdowns like Skiff.

Moreover, it hasn’t been settled that this whole endeavour is even ethical. As has been stated above, piracy is a perfectly valid privacy respecting option, but that doesn’t mean I want the team in the business of determining which piracy sites are best.

I can only speak for myself, but I wouldn’t want a focus on something like this until there’s more technological and ethical stability.

1 Like

It’s true that except GPT4ALL, all projects mentioned in the first post of this thread are unmaintained. But projects that have made it that far are likely to subside, wheter because of strong community (Kobold) or “corporate” backing (GPT4ALL, ollama,.)

What specific ethical concerns have you about AI ? To be clear, only reason Prvacy Guides isn’t recommending pirate sites is because they are illegal, not for ethical reasons. After all, we recommend way to bypass youtube ads, and therefore remuneration for creators, which could be seen as the same as pirating.

Also, PG do recommend cloud storage solution, which are terrible for climate. But those are personal choice, Privacy Guides is focused on privacy & security above all. That being said, we could add some “warnings” if you have some specific concerns.

1 Like

I agree with the comment about PG not being pioneering if by that you mean recommending alpha/beta quality software unless there is a strong reason for it. However, the comparison between piracy and AI is not a good comparison.

To put things in perspective, the entire American economy is currently held up by the stock price of Nvidia because of how much euphoria there is over AI, and the understandings that companies will need to build out GPU datacenters to support the AI that is coming. AI is getting integrated to nearly every mundane thing now. My wife just got an annoying AI spellchecker in WPS Office, that is the level of hype surrounding it.

The challenge in the future is likely going to be much like the current challenge with Google tracking. It’s going to be hard NOT to use AI.

Piracy is really just the same software and media distributed at zero cost with no other redeeming social value, which is why the sites are always seedy, whereas AI is doing some kind of “value add” - whether you like that value or not.

The intellectual property complaints will be a moot point because intellectual property itself is an arbitrary legal fiction, and tech companies have already spun some legal webs to protect themselves regarding the data their models use. If the government confirms the AI companies are within their rights to use copyrighted media, then there is no longer the IP argument. The government gives and takes away.

Source?

That’s the legality argument not the ethical argument. The ethical argument is more along the lines of “we have a capitalist economic system, where artists make things to get paid, and this newfangled ayy eye thing is coming in and taking art without remunerating the artist for their work” and “rather than automating away mundane tasks, rich capitalists have decided they would much rather automate creative work which is slimy and shit”

2 Likes

Since a PR is already made, is there anything holding this back ? @jonah
@dngray

I know generally we wait for 20 votes, but really when Jonah or another team member wants a new category, he basically just create it without (or little) feedback.* I would think it’s fair that this get included. I already have done a lot of work on this and it’s frustrating that it has stalled for months.

*Not saying this is bad, some topics spark less interest but are still interesting.

2 Likes

Alpaca is also great for Fedora.

Completely local.
You can run every model and install it with a good UI.

I don’t think recommending GUI client for ollama that is only available on Linux, is in scope. They are plenty of clients for ollama, but there is no point in recommending each of them.

What is your point privacy guide only recommend linux as os.what you will do with supporting other os.

We generally wants cross platform tools. Not everyone uses Linux, and they shouldnt be excluded from tools.

5 Likes