A new opt-in AI integration is being tested in Firefox Nightly, with the option to use an offline, on-device, private LLM

Disclaimer: Firefox Nightly is the alpha (pre-beta) release of Firefox, not all nightly experimental features make it into Firefox and the feature I am going to talk about, is still hidden even in Nightly (but exposed in about:config), so consider it pre-pre-beta, and a rough draft

TL;DR for those who don’t want to read my walls of text, opt-in ai integration included in Firefox nightly allows you to integrate your own llm, offline, on-device, and privately.


Good news today for Firefox Nightly testers interested in AI/ML.

Mozilla developers have introduced an optional and opt-in experiment available to Nightly testers which integrates an AI chatbot into the browser sidebar, This wouldn’t be good privacy news if not for this one single sentence mentioned briefly at the end of the announcement:

Nightly can be configured by advanced testers to use custom prompts and any compatible chatbot, such as llamafile, which runs on-device open models, including open-source ones. We are excited for the community to share interesting prompts, chatbots, and models as we make this a better user experience. We are also looking at how we can provide an easy-to-set-up option for a private, fully local chatbot as an alternative to using third-party providers.

If the above isn’t clear. Firefox is making it possible to integrate your own locally hosted, private LLM into the browser, running on your own hardware! :clinking_glasses:

This is exciting news to me, and its one example of why I’ve been supportive and defensive of Mozilla’s choice to get involved in the AI space early on, and I hope other browsers will follow suit!

This works with Mozilla’s own format for offline LLMs called llamafile as well as whatever other offline (and possibly custom online) llm software you use, you just have to point it at ip:port. I’m currently testing it with ollama/open-webui which is the software I use and Llame-3-8B

Screeenshot from my testing (updated to correct my mistake):

more screenshots of the UI

Screenshot from 2024-06-25 15-05-08 Screenshot from 2024-06-25 15-05-20

Because it works with whatever backend you already use or set up, it offers a lot of flexibility and potential, and much more user control and privacy.

There are some other potentially cool things that this approach enables. It appears possible to get this setup with searxng, to integrate locally hosted search into the locally hosted LLM all accessible right from the browser! :sunglasses:

2 Likes

One obvious downside: you’ll need a beefy computer with 16GB RAM (32 would be better) and an equally powerful CPU/GPU, ideally with some hardware “AI-cores” for this to work well.

Need is an overstatement.

But you are right that locally running a powerful LLM does heavily benefit from a lot of VRAM and memory bandwidth.

That said, you can do more than you think with less than you think.

I don’t even have a GPU, and my CPU is a 6 year old ultra low power laptop CPU, I can run Llama-3-8B at annoying slow, but still usable speeds, or smaller models at somewhat acceptable speeds.

What the local llm crowd seem to consider good entry level hardware is RTX 3060 12GB. I’m desktopless so I have to make do with CPU only.

Also, anyone with a Macbook M1/2/3/4 Pro/Max/Ulta and 16GB or more of unified memory should be in a good place to run a decent model.

This is pretty cool!

The UI seem very polished, do I see that there is speech recognition ? And the + button is for files upload ?
The UI depends on the tool you use.

Edit: tested it with LLama3 llamafile



Firefox Assit context menu

BTW, I modified the base prompt a bit so it will be named “Firefox Assit”.

Edit 2 : Works great with HuggingChat, gives an error message with Mistral (Le Chat).

It entirely depends on the size of the models. The smaller models are much more suitable for older hardware.

1 Like

Sounds like you answered your own quesiton, but yeah it seems right now (in its current form) the UI will depend on the service or self-hosted option you use. What you are seeing in my screenshot is what it looks like setup to use ollama + openwebui

1 Like