What is the Best Option for Running Open-Source AI Locally in Firefox Chat AI?

I’m looking to run an open-source AI model locally and came across Ollama and Llamafile. Since Privacy Guides recommends Ollama, I assume it’s well-supported, but I also found Llamafile, which focuses on portability and performance.

However, I plan to use this within Firefox Chat AI, which currently only supports:

  • Anthropic Claude
  • ChatGPT
  • Google Gemini
  • HuggingChat (Open Source)
  • Le Chat Mistral (Open Source)

From what I understand, HuggingChat and Le Chat Mistral are the only open-source options, but none of them (or the others) appear to support end-to-end encryption (E2EE). This makes me wonder:

  1. Can Ollama or Llamafile be integrated into Firefox Chat AI somehow?
  2. Which one would be easier to set up for a beginner?
  3. Which offers the best performance for local AI inference?
  4. Are there any secure and private alternatives that work within Firefox Chat AI?
  5. Privacy concerns – Mozilla has stated that Firefox AI features aim to prioritize user privacy, but recent concerns over its Terms of Use and integration with third-party AI providers have raised questions. How much control does Firefox Chat AI actually give users over their data? Is there a risk of data collection, and can this be mitigated?

What does “end to end encryption” mean to you in the context of a large language model?

I’m not sure it is a concept that makes a lot of sense in the context of an LLM or search engine (with currently available tech).

which currently only supports [list of providers]

Local models are supported as well.

  • Can Ollama or Llamafile be integrated into Firefox Chat AI somehow?

Yes, if you already have a local LLM setup, its quite straightforward, Firefox built in the ability to use local models from the start. But you need to flip a few preferences in about:config

Change:
browser.ml.chat.hideLocalhost to false

browser.ml.chat.provider to whichever address and port number your local LLM listens on (e.g. localhost:8080)

I like ollama personally but llamafile is really easy in terms of getting started (its essentially 1-click/zero config)

Other Qs and As
  • Which offers the best performance for local AI inference?

Should be fairly similar regardless of the software you use. Hardware (specifically memory bandwidth and capacity) is by far the most important factor.

  • Are there any secure and private alternatives that work within Firefox Chat AI?

Every locally hosted LLM should be secure and private assuming you trust the software running it (e.g. ollama).

How much control does Firefox Chat AI actually give users over their data?

Assuming you self-host the LLM, you should have total control. If you don’t, you trust whoever operates the model. If you don’t self-host it is essentially the same amount of trust that you put in your search engine, just like your search provider can (and must) process all of your search queries, an LLM also must process all of your queries.

  • Are there any secure and private alternatives that work within Firefox Chat AI?

You can point Firefox’s AI sidebar at essentially anything (local or remote), using the second about:config preference I mentioned above. But some things will work better than others. I use Duck.AI in the sidebar, but its important to note that neither Firefox nor Duckduckgo intends this, so some things work better than others.

3 Likes

Hi xe3,

Thank you for the clarification and help on this issue. I was able to change the browser.ml.chat.hideLocalhost to false and browser.ml.chat.provider to host: http://localhost:11434.

This is where I get lost; I’m not knowledgeable on what port number I should use. That’s all new to me. If I use Ollama, that’s what I found for that number. The AI chatbot now says “Ollama is running”, but there is place for me to begin writing prompts.df

Could you explain to me what Ollama does and why I should use it over the other options?

You shouldn’t necessarily use it over other options, but it’s what I use, and its one of the more popular choices. It may or may not be a good fit for you. It’s just one of many ways you can run an LLM locally.

Some of the reasons ollama is popular is (1) it has a big community, (2) pretty flexible and featureful (3) works well on a server or a docker container (4) open source (5) works with various frontends (such as OpenWebUI, which is what I use). ollama (for running the model and making it available via an API), and OpenWebUI (a web frontend that connects to ollama) are a very complementary combination.

http://localhost:11434 […] This is where I get lost; I’m not knowledgeable on what port number I should use

iirc the address you tried is the address that ollama’s api is setup to use. I don’t think that’ll work directly in Firefox (but I haven’t actually tried). I think you need to connect ollama to a frontend like openwebui first, and then connect that to Firefox.

If you are looking for something easy and simpler, a llamafile is a great way to go. It has less complexity compared to ollama but also less features and flexibility. For integration in the sidebar less features shouldn’t really matter much. The advantage of llamafiles is they bundle (1) the model (2) the software to run the model, and (3) the WebUI in one single executable. My recollection is that llamafiles are essentially one-click installs, automatically open in your browser and then you can just copy that address:port to hook it up to the AI sidebar.

My general advice is to figure out what software you’d like to use to run LLMs locally first, familiarize yourself with that, get it up and running, and try to wrap your head around the basics, read the docs and figure out how to access your chosen software from a browser, and then move on to setting it up to work with Firefox’s sidebar. By trying to integrate it first, before you have begun to even understand the software, is kind of approaching in the wrong order in my opinion.

Both ollama and openwebui have pretty good documentation on their githubs and elsewhere, and decent sized communities so there is lots of info out their. Once you figure all that out, integrating into the browser is quite straightforward. If you want to just get up and running quick, use a llamafile.