Layering Local LLM Over Commercial LLM for Privacy?

Yeehaw, y’all :cowboy_hat_face:

I’ve been mulling over an idea to use a commercial LLM for counseling in a private way.

This came to mind when I was exploring Claude.ai, which utilizes Claude-2 and allows access via VPN without the need for phone number registration, unlike ChatGPT. It’s an impressive LLM, but Anthropics privacy policy is a nightmare.

While the VPN hides my IP address, I believe my writing style and word choice can be used to fingerprint me. So, I’ve been contemplating a two-layered system to address this.

The idea is to run a less powerful open source LLM on my local machine. First, I’d write my query as I normally would, then this local LLM would sanitize my input, stripping away any unique writing styles or identifiable quirks. This sanitized version would then be passed to Claude-2.

On paper, this seems like a solid plan, but I’m curious to hear your thoughts. Are there any potential pitfalls I’m overlooking? Could this be an effective method to reduce the risk of being identifiable by these companies?

The biggest drawback I would say is the hardware requirements to run something like that locally. From what I’ve seen, and this may be already outdated so please correct me if I’m wrong, you need a pretty decent graphics card with enough VRAM to get even the most basics models to run.

You can try to use a translation service like deepl and translate your query into multiple languages, then back to your original language. You could even alternate different languages as you use different VPNs to mask your location. I have no idea if any of this would be enough to prevent identification but that’s the first thing that popped to my mind when I was reading your post.

Thanks for the feedback and suggestion!

You make a fair point about the hardware requirements for running a LLM model locally. However, due to techniques like quantization, it’s becoming feasible to run some more basic models without exceptionally powerful hardware.

The translation idea you mentioned is clever too! Though after thinking more on this situation, I’m not fully confident in my ability to avoid identification by commercial entities.

So I’ve decided to avoid using their services for sensitive matters and wait 1-2 years until I can afford the hardware for an open source model like Falcon-180B. It just got released and is already about as capable as GPT-3.5, which is really impressive.

Since open source LLMs are advancing rapidly, I’ll hold off until I’m able to run one entirely locally. Thanks again for the insightful ideas! I appreciate you taking the time to share your perspective.

Thank you, for sharing that info on Falcon-180B. I’m also hoping to be able to run something like that locally but I haven’t been paying too much attention, as I thought it would still take some time until that’s a feasible reality.

I just saw this news that Intel and Nvidia are going to be making dedicated chips to run artificial intelligence programs, so things are looking bright on that front, although I’m sure this will come with additional privacy concerns somehow…