I was actually kind of disappointed by the replies here.
But then, I went into each articles and there isn’t anything that bad actually. Everything, was addressed and most of them seem to have been bugs which happen everywhere.
The only one that I’m not sure is this one:
Also in 2023, Brave got caught scraping and reselling people’s data with their custom web crawler, which was designed specifically not to announce itself to website owners.
Brave replied with this:
Hello Alex,
Thanks for reaching out for comments about your article, but in the future we would appreciate it if you would check in with us before publishing your piece in order to fact-check your assumptions. There are several items in your article that are inaccurate and which lead to unnecessary confusion.
The rights being mentioned are not rights to content, copyrighted or not, as the article misleadingly seems to imply. The rights are to the output of the API request, which is a set of results to a query sent by the API user. Brave Search has the right to monetize and put terms of service on the output of its search-engine. The “content of web page” is always an excerpt that depends on the user’s query, always with attribution to the URI of the content. This is a standard and expected feature of all search engines.
Where you see Brave Search API as a way to shamefully make money, we see it as a service to all the people who want to innovate on search and LLMs, who could use only Microsoft Bing Search API, which is in reality a monopoly (Google’s search API is not open-access). This is a pretty different take, not as clickbait-y though.
There are also some doubts towards how crawling is done, which could have been solved by asking before publishing.
Brave Search has a crawler which is partially powered by information provided by users enrolled in the Web Discovery Project (WDP) option in Brave browser’s search settings, which is an off-by-default AKA opt-in, privacy-preserving system with multiple mechanisms to prevent Brave from knowing who is contributing what (WDP is open-source for inspection by anyone).
The reason we do not expose a crawler user-agent is practical: we do not have the resources to contact all domain-owners, who rightfully or not, discriminate against anyone but Google. If a domain or page is not crawlable by any search engine (it has a no-index tag), or if it is not crawlable by googlebot, then Brave Search’s bot will not crawl it either.
Regards,
Josep M. Pujol
Chief of Search at Brave
Which the journalist added these comments:
This initial email doesn’t do a great job of answering things like:
- How does Brave handle various licenses? Is there an automated system to check a site’s license and then skip things like showing 260-word blurbs of word-for-word copy/pasted content? For example, if I was to add a CC BY-NC-ND license to this site, how would Brave handle it? This particular license clearly states that the content cannot be used for commercial purposes.
- Brave’s reasoning for not disclosing their Search Crawler is that it is for “practical reasons” as they don’t have the resources to contact all domain owners who “block” or “discriminate” against them. That doesn’t make any sense whatsoever and feels like a red flag of bypassing websites explicitly blocking their crawler. Unless, of course, you choose to block Google’s crawler, and then that will make Brave happy.
And as far as calling my article an assumption goes, it quite literally says on their Brave Search API page that you get “Rights to use data for AI inference”. That same page does not explain licenses, as it stands – their API is a pipeline that you can use to gather fine-grained data.
Brave doesn’t care if it’s licensed one way or another; apparently, they can monetize other people’s licensed content because they’re a search engine.
I’m genuinely curious how “content” and “output of the API request” are two different things, particularly when I already showcased that they give you word-for-word “extra snippets”.
Any inputs from the community on that particular one?
As for Chromium based-browsers, the person went with Vivaldi. But it seems Cromite is also suggested by others here?
What about Iridium? or Falkon?