Remove SearXNG

Honestly, I don’t see any reason to use a SearXNG instance over Google. SearXNG is essentially a MiTM (Man-in-the-Middle) between you and the search engines it uses. Not only am I sending my search queries to SearXNG, but also all of the search engines it uses including Google.

If I used someone else’s SearXNG instace, I not only have to trust them with my IP address and searches, but I also have to trust Google, Bing, and all other search engines it uses with my searches, even if they don’t see my IP address. Wouldn’t it make much more sense to just use Google and use a VPN to hide my IP address from them?

If I self-hosted a SearXNG, Google and the other search engines I use can still see my true IP address unless I use a VPN or Tor. This isn’t a privacy gain. If anything it’s a loss because I’m now sending my search queries to multiple search engines instead of just one.

But if I used Google, I only have to trust one search engines with my searches (Google), as opposed to several search engines plus the SearXNG instance maintainer, and like I said I can use a VPN to hide my IP address. Or let’s say my VPN gets blocked. It’d be much safer to use Startpage or Brave Search if I wanted to get Google results because not only am I reducing the amount of search engines to trust (just Google and Startpage/Brave), but I’m not standing out as much since far more people use Startpage than any one SearXNG instance. If I was using SearXNG, Google would still know it’s me since chances are not a lot of people if any at all are using that specific instance. I don’t believe SearXNG increases privacy at all but reduces it.

So my proposal would be to remove SearXNG and add Google in it’s place. I think most people should just use Google search unless their threat model for some reason requires not connecting directly to Google. Realistically, I don’t see why anyone should want to “DeGoogle”. Nobody should assume that just because something isn’t Google means it can be trusted. If anything, Google services are more trustworthy than any self-hosted provider since more people use the services and there are more eyes on Google.

Finally, if one for some reason doesn’t want to use JavaScript, Whoogle is less bad than SearXNG since it only crawls from one search engine opposed to several.

I disagree. Have you considered cycling SearX instances? There is https://searx.neocities.org/ that does that with opensearch protocol also.

Spreading your search habits and metadata across different SearX instances is miles better than giving everything to Google.

No, because you’re still trusting the SearXNG instance maintainers by using the instances. It’s not easy to verify whether a SearXNG instance is malicious and malicious SearXNG instances could be ubiquitous (unlikely, but possible). Remember that your search queries will still be sent to Google if the instances use Google search, so all your searches are sent to Google regardless. Using an extension like LibRedirect is even worse as it now has access to not only all your searches but all data across all websites you visit. This is why you should never use any extensions, not even uBlock Origin, but I digress.

And think about what I said. Most people don’t use SearXNG so if Google gets a search from an IP address belonging to a SearXNG instance, Google might still know it’s you. Startpage is a lot more popular and by using that, Google might not know it’s you because so many people use it. Now let’s consider Tor. Part of the reason why Tor is good for privacy and anonymity is because so many people use it. If only journalists and whistleblowers used Tor, they could be tracked and deanonymized easily because they were the only ones using it. Again, I digress.

Google search does have good security as it can filter out malicious websites. If this community doesn’t trust Google, at least still remove SearXNG for all the reasons I mentioned and list Brave Search or Startpage as the top recommendation for those who want Google results. Using SearXNG is a false sense of privacy.

From what I understand, SearXNG strips tracking content from things like URLs and HTTP headers, generates random browser profiles, doesn’t send cookies to the search engines, and hides the search query and referring page from result pages. I haven’t verified the source code and I don’t have technical knowledge about how effective those measures are, but they seem to offer a straightforward privacy benefit, assuming you use a trusted instance (as the documentation recommends).

Secondly, is using random public SearXNG instances over Tor or a trusted VPN not a strong mitigation technique for behavioural profiling methods? I assume that exposing all of your search data to Google is less effective—if you’re trying to do something like minimize the data you expose to surveillance capitalists—than diversifying that data across countless public instances (with sufficient traffic), so that each instance only ever receives a small part of your search data and so that Google only receives your search data from a range of traffic streams.

This is really interesting to consider, though, and (like I said) I’m far from an expert on this topic :slight_smile:

3 Likes

The thing is, randomization isn’t effective and it can be detected. If I searched “privacy guides” on two SearXNG instances that nobody used, Google could know that those searches belong to the same person. Even if the SearXNG maintainers don’t see all of your searches, Google still will if all the instances use Google results. As for browser profiles, those shouldn’t be fully randomized, especially the user agent string since websites can identify you by detecting random, rare user-agent strings and grouping those, for example, if you used a Chrome 40 user-agent then a Firefox 12 user-agent, they would likely know you’re randomizing your fingerprint since nobody else does that nor does anyone use ancient browser versions.

While SearXNG can strip tracking content and doesn’t send cookies to identify you, Startpage and Brave Search do the same. However, this doesn’t matter if you’re the only one using a particular SearXNG instance. The problem with random instances is there’s nobody watching them. Would anyone know if a SearXNG instance, unknown email provider, or any other unknown service sold sensitive data to hackers?

Sending search queries to other search engines is something all metasearch engines do, but the risks can be minimized by using a search engine that’s both popular (so that everyone is uniform) and only fetches results from one crawler.

I don’t think we are interested in removing SearXNG at this time. There are plenty of advantages to using a search engine frontend like this that people might be interested in, including but not limited to:

  1. Aggregating search results from multiple providers gives you the most complete possible search results, some search engines are known to or are rumored to censor/demote various topics.
  2. Searx links directly to results, there’s no outbound link tracking to see exactly which result you visit.
  3. There are many different public providers, some of which are well known privacy activism organizations. Many people are likely to trust these providers over Google. Additionally, most large search engines are based in the United States, so this lets you trust a data provider in a specific jurisdiction if you wish. For example, EU citizens may prefer their data to remain in their home country or the EU.

Even self-hosting a private instance can provide some benefit. While your searches will be coming from a unique IP address, this is not any worse than directly connecting to the search provider, and the entire frontend code you’re visiting in your browser is code you trust and control.

I see a lot of pro-Google posters face this same confusion. Just because we are saying to not trust Google does not mean we’re saying you should trust everything that isn’t made by Google, don’t mix this up!

Avoiding tracking networks like Google and Facebook is specifically part of what we do here and is one of our common threat models, so recommending that people just use Google just because they’re big is a non-argument.

2 Likes

If I searched “privacy guides” on two SearXNG instances that nobody used, Google could know that those searches belong to the same person (§ 1).

However, this doesn’t matter if you’re the only one using a particular SearXNG instance (§ 2).

Sorry, I probably should have been more emphatic when I said “diversifying that data across countless public instances (with sufficient traffic)”: you’re absolutely right, and I think that’s why the Privacy Guides website advises that “it is important that you have other people using your instance so that the queries would blend in” (Privacy Guides).

As for browser profiles, those shouldn’t be fully randomized, especially the user agent string since websites can identify you by detecting random, rare user-agent strings and grouping those…

Again, this is an important consideration to resist fingerprinting. However, after reviewing the source code, it seems that SearXNG only uses recent, popular user-agents (see utils.py, online.py, and useragents.json). Judging by the commit history, they are also kept up-to-date.

Would anyone know if a SearXNG instance, unknown email provider, or any other unknown service sold sensitive data to hackers?

This is a good point about the extension of trust which occurs. Assuming your question isn’t rhetorical, no: it seems like it would be quite hard to tell. However,

  • in the case of using trusted instances (such as your own, properly configured and with sufficient traffic, or another that you trust, as I outlined, and with sufficient traffic), this isn’t an issue.
  • In the case of randomly cycling your use of public instances—from what I understand—a particular public instance only receives a small quantity of your search data. Given SearXNG’s privacy features, and considering my previous reply, it seems like a simple upgrade from

    • Google receiving all of your search data directly, in the knowledge of their malpractice,

    • to Google receiving all of your search data from a range of sources, with significant de-identification measures in place, and the potential for public SearXNG instance maintainers to be bad actors.

In both cases, Google receives all of your search data. However, in the first case, you know that the probability of attack is high, and the consequence of attack is significant. However, in the second case, the probability of attack from Google is high—though the feasibility is low—and the probability of attack from instance maintainers is very low (without evidence)—while the consequence is also very low.

If your search data threat model does not tolerate those levels of risk, then using more stringent anti-identification measures, like using the Tor Browser or public library computers, might be more appropriate. Regardless, it seems to me that SearXNG maintains a clear use case in addition to tools like Brave Search or Startpage.

Is this an accurate analysis? Please let me know, if not!

2 Likes

Collaborator from Libredirect here. Libredirect only redirect URLs based on a Regex pattern and doesn’t store data except is own settings or transmit data anywhere on the internet. You could just verify if we do log data or not.

1 Like

what.the.actual.f***

Are you in your right mind? You come to a forum dedicated to privacy and evangelize totally anti-privacy software/services: Edge browser, Google Search. You really, unironocally think that we, a project wholly dedicated to privacy, should recommend people to use Google? Jesus Christ… it’s unbelievable and ridiculous, I have no words.

What you are saying on this forum is totally antagonistic to achieving privacy. I’m really glad you haven’t visited this forum since January. You clearly seem to be a troll: hijacking a pro-privacy forum by posting 100% anti-privacy recommendations. If you are not a troll, I cannot find any sane explanation for your behavior, precisely: for what makes you to come to pro-privacy forum and sabotaging it by spreading anti-privacy stuff. Jonah even wasted his time answering to you when you recommended people to use Edge in another thread, and recommended people to use Google in this thread.

I’m sure you are 100% aware of privacy implications using Google. This proves you are a troll: you recommend Google while knowing that it is not a private search engine in any way. While Google Search is, in my opinion, the most productive, convenient, fast, feature-full search engine there is, it is anything but private.

If I sound rude in my message, it is because I’m irritated when people engage in bullsh***ing. But that’s the main purpose of a troll — to provoke someone and drive them mad. The agenda of PrivacyGuides is, obviously — privacy. Privacy and security. You are fully aware that Google is anything but private, and you still spread your anti-privacy propaganda.

I can now vividly feel and understand Micay’s behavior: it is very difficult to handle bullsh***ing when there’s so much of it around. It really takes a toll on one’s behavior, and it’s taxing on one’s psyche. But that’s a whole other topic on which I have a book to say about.

1 Like