Araa (self-hosted Google metasearch engine)

I think it’s worth keeping the eyes on Araa, which is a privacy-respecting search engine using Google results. It’s still in the early stage of development, but it’s blazing fast and seems has a lot of potentials. If you want use and privacy-respecting frontend for for Google, but don’t want to hassle with gazillion options that SearX has, Araa might be a good choice. There is a Google frontend called Whoogle, but it often was rate limited…

Araa is self-hosted. Seems like a lot of hassle.

is it any different than using google in incognito?

I think IP Address doesn’t really have weight.

@jonah any comments?

What about fingerprinting? But i agree that it seems kind of unnecessary given the already recommended tools

only data point google has in incognito is IP Address (Hopefully) and I think that doesn’t really have weight.

Unless you’re using Tor, Mullvad, or Arkenfox-hardened Firefox, they have a lot more data points, including: screen size, OS, useragent, canvas stuff, and more (just take a look at Arkenfox’s github repo for more details)

1 Like

Is mullvad and Arkenfox available on mobile?

I will add that even some of the most advanced anti detect browsers primarily used in marketing (the anti detect browser market is insane btw and not some small niche) can never truly mask all parameters.

Eventually, a site can always be able to keep track of the user. See it this way:

  • where a user alters too much browser parameters, the entity can track the user based on the high uniqueness.

  • On the other hand, if the browser parameters are fairly/strongly generic too, there is always that one parameter that’s unique to the user in general, be it fonts or hardware data.

But in essence, even the developers who make their living by creating anti detect browsers admit that it’s oftentimes impossible to really mask fingerprint.

That’s why search aggregators like searX or Araa in this case are very useful. Instead of sharing your browser fingerprint with the search engine entity directly you instead use a proxy, the VPS to make that search for you. Add in a lot of users visiting that aggregator in due course, and you get a better veil of anonymity. Of course, using VPN mitigates the issue of the hoster knowing your IP and it considerably and substantially increases privacy in my humble opinion.

Hence I support this thread and request the team to at least make a genuine consideration to get such projects added to the guides.

Edit: This part will be controversial to say the least so take my words with a grain of salt. There are other very very harder to mitigate ways of tracking as well which is called (TLS fingerprinting)[What is TLS fingerprinting?]. It’s very hard to get rid of this method of fingerprinting. I know this for a fact because I am one of the oldest user who’s been in the Discord automation niche since years. Since late 2022/early 2023 discord has actually implemented TLS fingerprinting due to which generated discord accounts/mass dm accounts were getting instant flagged irrespective of using residential proxies and a good user agent. People then fixed it by implementing rotating TLS ciphers which then fixed the issue of accounts getting flagged.

And even in the above controversial case, using a proxy search aggregator solves the issue of tracking.

Definitely, no question about it. Tor browser is an excellent project, but basically my point is to demonstrate that even having excellent browser profiles are not foolproof. My intention is to show that having an external search aggregator proxy your requests would make it more foolproof. That’s one of the reason why privacy operating systems like TAILS are designed to route everything via Tor, because it’s based on the assumption that the user can’t be trusted to always stay on top of things and that user can and will make mistakes.

The only downside in using these search aggregators are that depending on your threat model you may not like placing trust on the hoster/VPS company, and if there are no significant users of these aggregators then it definitely becomes a privacy concern and defeats the point of using one as the entity can correlate it easily. That’s why I really hesitate to self-host these kinds of tools and rather use public instances.

Or the fact that one had typed it at 11PM in the night while half dozing awayy :))))