After looking into Brave’s “Google Fallback Mixing” feature, i’ve noticed some major differences between how it markets itself and how it actually works.
https://search.brave.com/help/google-fallback
Brave promotes the feature with a promise of anonymity, assuring that this does not threaten user privacy in any way:
- “As Brave launches its private, independent search engine, we must still meet user expectations.”
- “you can allow the Brave browser to anonymously check Google for the same query”
- “Brave Search gives you the option to let your browser anonymously check Google when our results need more depth.”
- “Note that choosing this option has no effect on your privacy.”
- “If you happen to have a Google account, Google will not be able to associate your query with this account.”
Another contradiction:
- “The result sets are mixed in your browser, and sent back to us for analysis so we can learn what types of queries need more work.”
- “And Brave does not keep your queries in any shape or form.”
However, MitM network analysis showed that each fallback query sends the user’s real IP address directly to Google.
This is also confirmed by the developer:
https://news.ycombinator.com/item?id=27593801
This allows Google to continue building a shadow profile based on IP, location, time of day, and browser fingerprint, even if the search is not tied to a specific Google account.
This feature effectively turns the user into a data collection agent for Brave. The documentation and description of the function in the search engine settings make it pretty clear what this data exchange is for:
- “For queries where Brave Search is not yet refined, your browser will anonymously check Google for the same query, mix the results for you and send the query data back to us so we can improve Brave Search for everyone.”
From this, I conclude that the operational flow is as follows: The user’s browser makes a direct, non-proxied connection to Google, exposing their IP address. The results from this query are then sent back to Brave’s servers for analysis. It’s basically using the user as a proxy.
This means the user shoulders the privacy risk of communicating directly with Google, only for the resulting data to be used to train Brave’s own commercial product. It is a clear instance of using the customer’s resources and privacy exposure to gather competitive intelligence.