Remove SearXNG

If I searched “privacy guides” on two SearXNG instances that nobody used, Google could know that those searches belong to the same person (§ 1).

However, this doesn’t matter if you’re the only one using a particular SearXNG instance (§ 2).

Sorry, I probably should have been more emphatic when I said “diversifying that data across countless public instances (with sufficient traffic)”: you’re absolutely right, and I think that’s why the Privacy Guides website advises that “it is important that you have other people using your instance so that the queries would blend in” (Privacy Guides).

As for browser profiles, those shouldn’t be fully randomized, especially the user agent string since websites can identify you by detecting random, rare user-agent strings and grouping those…

Again, this is an important consideration to resist fingerprinting. However, after reviewing the source code, it seems that SearXNG only uses recent, popular user-agents (see utils.py, online.py, and useragents.json). Judging by the commit history, they are also kept up-to-date.

Would anyone know if a SearXNG instance, unknown email provider, or any other unknown service sold sensitive data to hackers?

This is a good point about the extension of trust which occurs. Assuming your question isn’t rhetorical, no: it seems like it would be quite hard to tell. However,

  • in the case of using trusted instances (such as your own, properly configured and with sufficient traffic, or another that you trust, as I outlined, and with sufficient traffic), this isn’t an issue.
  • In the case of randomly cycling your use of public instances—from what I understand—a particular public instance only receives a small quantity of your search data. Given SearXNG’s privacy features, and considering my previous reply, it seems like a simple upgrade from

    • Google receiving all of your search data directly, in the knowledge of their malpractice,

    • to Google receiving all of your search data from a range of sources, with significant de-identification measures in place, and the potential for public SearXNG instance maintainers to be bad actors.

In both cases, Google receives all of your search data. However, in the first case, you know that the probability of attack is high, and the consequence of attack is significant. However, in the second case, the probability of attack from Google is high—though the feasibility is low—and the probability of attack from instance maintainers is very low (without evidence)—while the consequence is also very low.

If your search data threat model does not tolerate those levels of risk, then using more stringent anti-identification measures, like using the Tor Browser or public library computers, might be more appropriate. Regardless, it seems to me that SearXNG maintains a clear use case in addition to tools like Brave Search or Startpage.

Is this an accurate analysis? Please let me know, if not!

2 Likes