Mojeek

Mojeek is a search engine that has its own index and no trackers. It is not perfect, but for most of the searches it can have good answers. They also fit in all of the requirements.

Mojeek is one of the rare few large independent indexes left. Gigablast appears to have fallen off the map recently, leaving Brave and Mojeek as the largest independent indexes with good privacy policies.

Mojeek goes a step further with their independence—they don’t depend on AWS, Azure, Google Cloud, or any of those big hosting companies. They host their servers at a local UK company (search for “Mojeek and the Environment” to find their blog post on this)

Mojeek, like Brave Search, also doesn’t serve Microsoft Ads as all Bing proxies like Duckduckgo and Ecosia are required to do by the Bing API’s terms of service.

Their results are not amazing for certain types of queries; programming queries, for example, are hit-or-miss. They also mainly index English sites, though they index French, Spanish, German, and pages in a few other languages to a lesser degree. They don’t index pages in Japanese, for example. However, for general queries, they are good and improving all the time.

If Mojeek can’t fit into Privacy Guides’ recommendations because of their search quality, there should be clear requirements surrounding search quality.

Brave Search, because of how Tailcat (the index it’s based off) collected search activity from Firefox users who used other search engines like Google, tends to return results that resemble Google’s. Their current indexing strategy seems to be similar, though I won’t claim to have looked into it much beyond the initial Cliqz stuff. Whether this is a good or bad thing depends on the types of results you’d like to see.

Mojeek returns results based on the keywords you provide, rather than attempting to guess intent—it uses Lexical Search: https://blog.mojeek.com/2022/12/how-to-search.html

Mojeek uses lexical not semantic search. In other words Mojeek looks for explicit matches to the words or phrases in your search query. These search queries are compared with text in the webpages that we have indexed and their incoming links. Mojeek does not attempt to interpret meaning from your query. So you should be explicit about what you are seeking.

8 Likes

I also like mojeek. but I understand why it might not be listed. I would think a link to searchengine.party or a similar resource would be good for those that want more options, since there are so many that could be brought up/listed

1 Like

I think Seirdy’s fantastic search engine article (which is periodically updated) is a better place to link to: A look at search engines with their own indexes - Seirdy

It’s not solely privacy-focused, but this is the most detailed article I’ve ever found about this topic on the web.

5 Likes

not just keywords now from:

https://blog.mojeek.com/2024/02/major-algorithm-update.html

2 Likes

think they still got some work to do

privacyguides.org is the 6th result:

Mozilla’s “Privacy Not Included” guide is also at #9.

That’s not bad, but not amazing. Trying another search, “bootstrap docs”, I expect to get the docs for the 5.x series, but instead get the docs for 3.3 rather than 5.3 (10+ years out of date), and because clustering is limited to 1, that’s the only result from that domain I get:

It’s not too bad; I can click on the page, then click on “there’s a newer version of Bootstrap!” to get to the latest docs. But certainly not optimal.

But if I search for Symfony docs, I get some really great results:

Not only do I get the latest docs page as the first result, I also get a link to the Github page for those docs and a result for the Twig docs, which would obviously be handy if I’m working with Symfony. My only complaint is that it’s for the 2.x series, which was EOL as of December 2023, as compared to the 3.x series which is current.

This is a microcosm of my experience with Mojeek. Some good and even great results in some areas, and some salvageable and bad results. Overall, it has gotten a lot better over the years and I find myself using other search engines less and less. I’m excited for what’s to come.

…and awaiting the day Mojeek starts indexing Japanese results.

1 Like

I would say its bad enough.

Every reccommended search engine on PG shows PG first with that exact same search parameter.

Its just a sign that the algorithm is not built out enough to give the level of reliability that the already reccommended engines offer. Which begs the question “what does mojeek offer thats not already being offered by the PG reccommendations?”

5 Likes

Your source talks about a Firefox expirement, not Brave Search

1 Like

I suggest reading more about Cliqz.

In August 2016, Mozilla, the developer of Firefox, made a minority investment in Cliqz. Cliqz planned to eventually monetize the software through a program known as Cliqz Offers, which would deliver sponsored offers to users based on their interests and browsing history. However, these recommendations would be processed locally based on a remote repository of offers, with no personally identifiable data sent to remote servers.[9]

On 15 February 2017, Cliqz International GmbH, a wholly-owned subsidiary of Cliqz GmbH, acquired the privacy-oriented browser extension Ghostery.[10][11]

On 29 April 2020, Cliqz announced it will shut down its browser and search engine.[12] Subsequently, the search engine - called Tailcat - was acquired by Brave.[13]

This Reddit comment sums it up: Antabaka comments on Mozilla ships Cliqz experiment in Germany for ~1% of new installs, collects surf data, including URLs

Edit: I should probably state outright that Brave bought Cliqz’ search engine Tailcat, which Cliqz originally owned, and which partnered up with Mozilla to train its index based on what websites Firefox users visited, which they usually found with Google. Mozilla also had a minority investment in Cliqz at one point. This sort of indexing strategy lives on today in a slightly different form with the Web Discovery Project in Brave Search.

So Cliqz, Tailcat, and Brave Search are intertwined in a way that is hard to explain concisely. There’s plenty more information about this on the web if you’re interested in looking. Brave has a news post about this too.

I’m not saying that this indexing strategy is bad (I certainly don’t know enough about search engines to say so; I certainly think it’s interesting), but just drawing a comparison between Brave Search and Mojeek’s lexical search method. I would guess that most people will find lexical search more frustrating than the way Google and Bing try to “interpret meaning”.

It’s also why you get exactly the right result if you search for privacyguides in Mojeek, but the results are not what you’d expect if you search for Privacy Guides.

2 Likes

Method used:
Mojeek has a fully independent crawler. Brave use the Google Crawler (meaning when indexing pages it claims to be Google and not Brave) so this mean less bot blocking but reinforce Google position. (Sites will not allow anyone but Google and maybe Bing)
Brave also has option to fallback on Google, and displays “Check on Bing/Google/Mojeek” button.

Source is Brave FAQ.
Index size :
The two seems to have roughly the same index size. Brave says it only indexed 10% of the web -or 8B pages, and Mojeeek says it has 7B pages. On contrast, Google is estimated to index 80B pages every day.

Experience:
1)Brave is more seamless as it is integrated in a browser
2)Brave interface is easier to use as it is more modern and follows expected modern design language.

On the other hand, mojeek interface is difficult and feels old.

Mojeek is also very sensitive for bot detection. Entering special signs might triggers bot detection.

Conclusion :slight_smile:
While I like Mojeek project and mission, their product is lacking usability. So I think it shouldn’t be recommended.

That being said, I would like to hear for Searx users about Mojeek.

Sources :
Brave Podcast about Brave Search

1 Like

Mojeek also has this in the form of search choices, if you weren’t aware. You can turn on and off a total of 13 different engines. By default, they select Brave, Ecosia (Bing proxy), and Startpage (Google proxy). You can also enable the option to display the choices below the search bar at the top.

Could I ask where you got this information? I can’t find it in the pages you linked, or in the Brave Search FAQ. Mojeek releases a blog post for milestones in index size, but Google has always been secretive, and I couldn’t find much recent information about Bing’s index.

If you faced any issues with the interface, absolutely tell Mojeek about them because they’re very responsive. Seirdy once brought some accessibility issues to Mojeek and they were actioned quite quickly.

This is definitely true. I hit it often when I’m searching for errors my console spits out. If you browse community.mojeek.com, you’ll see users make posts about this occasionally. One of the reasons Mojeek’s bot-blocking method has so many false positives is touched on here: Powered by - 403 Error - #13 by Josh - Feedback and Developments - The Mojeek Discourse

They specifically want to avoid using a bot-blocking method that requires Javascript:

Basically just that our offering should work with it disabled, it should not be a thing which is required to use Mojeek. There have been a few people who have specifically come to us, or in one case added us as default in their browser because of this (scroll down to privacy browser for dodging the preamble).

I have hopes it’ll get better over time.

2 Likes

Here is the source.Interview with Brave Search - by Dmitri Brereton - DKB Blog
It is 10B pages, not 10% sorry.

For Google, information is in the Quora thread.

This is more of a general problem. Everything feel outdated, and is too small. Maybe its because I’m used to Google, or the design copy of Brave Search.

Personnaly I don’t really want using other services. But does Mojeek support !g, !ddg, !b
to quickly switch to another provider?

I also have hope. They got “Angel investors” so this should ensure at least 5y of continuation. One point of concern is that they seem to be against AI. (They have proposed a no-ML regulation to allow website owner to block their content for being used for AI-training.)
I understand the concerns, but Brave search summary is very useful, especially as it cites it sources.

Thanks for the interesting link!

It does not. I don’t use bangs myself (I just add search engines to Firefox), but I can see the utility.

Some discussion threads about the topic:

  1. Implement !bangs - Feedback and Developments - The Mojeek Discourse
  2. I've realized search choices is actually better than !bangs - Feedback and Developments - The Mojeek Discourse

The NoML standard makes sense even if you support machine learning, and especially if you support a variant of ethical machine learning where you believe authors should be able to opt-out of having their work used as machine learning data.

For a more detailed look at what exactly Mojeek thinks of generative AI, this is an interesting post: Generative AI Threatens Diversity and Hyperlinks | Mojeek Blog

  1. About search choices, I am confused. Since Brave, Ecosia and Startpage are selected by default, does this mean my result are a mix of Mojeek results and the three others ? Now, I enabled the search bar option with allows to select another search engine. I mean, the reasoning against the bangs seem dubious, at best. Too addictive and get you back to Google ? Well, personally I only use Google when I know the results will be bad, and most often after I got a bad results. And I mean, it’s your choice. The fact is that I see Brave remaining my main search engine, and I like than I can easily use Duck when needed. If Mojeek doesn’t have bangs, then I can’t use it. The truth is that Mojeek search results are inferior. Sure Google doesn’t have bangs and I still use them 5% of the time. At least with bangs, I could just check without loosing much time.

Why not make it opt-in ?

2)About AI answers

It’s an interesting opinion. I think there were three main concerns. 1)The number of ads, make the search useless or untrue. If the Instant Answers are based on this, then Google just approve false information. 2) AI answers remove the option to discover information sources, and instead just allow you to ask Google. 3) The diminishing reliance on actual links visit will render impossible website relevance classification.

I do agree with 1. However I don’t with 2 or 3. If you have some time, go use Brave Search. You will see that there is a lot of AI summary at the top. But the good think is that it gives sources. So I see this as here is summary, with the 3 best pages about the subject. Just like a good conference about a subject.

I may add more info in the future.

About search choices, I am confused. Since Brave, Ecosia and Startpage are selected by default, does this mean my result are a mix of Mojeek results and the three others ? Now, I enabled the search bar option with allows to select another search engine.

Search Choices is just the same thing as what Brave does by listing Google, Bing, or Mojeek at the bottom of the results page. If Mojeek’s results don’t work for you, you can click one of those buttons to search elsewhere. Mojeek’s results are powered purely by their own index.

That being said, their image results use Pixabay’s and Openverse’s indexes, which will return images that have a more permissive copyright license.

This isn’t a Mojeek staffer’s opinion; just a community member’s. I don’t actually agree with their reasoning. Josh and Colin are Mojeek staff members. Their conclusion at that time was: “Let’s release Search Choices, see what people think, and if people really want bangs, then we’ll think about doing that too.”

Though, Josh does say he agrees with the reasoning that bangs make it very easy to fall into bad habits. Search Choices are only slightly more work to use than bangs, so I don’t agree with that as a reason not to implement bangs. That being said, Firefox makes it super easy to query other search engines anyway, so I haven’t used bangs in a very long time after I initially discovered them and thought they were awesome.

But I do think Search Choices are better because they’re more discoverable and easier to use as compared to say, learning the “bang code” for a search engine.

If all you use bangs for is checking another general search engine like Google, Bing, Duckduckgo, Ecosia, Startpage, etc., then Search Choices are a drop-in replacement that are more discoverable but maybe a little slower.

It’s a tough balance. If you think about it from the perspective of someone using a search engine, it sure would be handy if your question was answered right at the top and you didn’t even need to visit a site. Most people would find that useful.

On the other hand, if you think about it from the perspective of the person who wrote the information you find useful, which has been taken by the search engine (without their permission) and served to you, this reduces traffic to their site. There was even a big blow-up about that a few years ago with news sites complaining about Google doing this to them (though it was certainly controversial as to who, if either side, was right).

Sure, links will be included at the bottom, but if the snippet answers your question well enough, why would you bother checking?

Mojeek does do instant answers—but only with Wikipedia. Wikipedia explicitly allows their content to be used in this way with their permissive copyright license. So Mojeek is really interested in being fair to those they interact with to deliver their service to their users. So it’s a tricky balance, and it’s not a compromise all users are willing to accept. Maybe Mojeek will change their minds in the future.

When I want a search engine with all these features, I go to Kagi (it does have a Quick Summary feature).

Maybe I just disagree on a deeper level. Journalists take information from other sources, often other newspapers, to make their own articles, albeit a summary of other articles. I consider this great, because the awareness on the article subject was increased. Imagine if Reuters couldn’t cite information from The Times, The Post, etc. I think if you wrote an article, your goal shouldn’t be make more money, but increase awareness.

Agreed, although I mainly use Brave, the ability to have like 7 search engines on Mullvad Browser is amazing.

Well, yes and no. Yes because once set-up it is more intuitive. No because the settings for this aren’t easy to understand, like the average person will wonder why Mojeek, a search engine lists other search engine. No because bangs are now a standard between alternative search engines, while Mojeek solutionisn’t.

Kagi relies on Google though. I try to prefer independant search engines like Brave and Mojeek, then those based on Bing, since they at least challenge the Google monopoly.


Stepping back, I do support recommending Mojeek. Now it’s true PG already has 3 recommendations. What are other’s opinion on this ?

Kagi relies on many search engines, including its own indexes, Teclis (for small sites with less than 5 trackers) and Tinygem (for news). It even relies on Mojeek and Brave. And Marginalia.

Conspicuously absent from the sources page is Bing. I believe they stopped relying on them a year or so ago when Bing dramatically increased their API prices.

I prefer independent search engines like Mojeek and Brave where possible, of course.