@Niek-de-Wilde@jonah
EDIT: Startpage has provided a reason for why this had happened. The response that jonah gave makes it appear as if they are using a product from a third-party (Amplitude maybe?), and this would also explain the content of this post. I have removed the sections of this post claiming malicious intent.
The Canvas fingerprinting along with a few others appear to be specifically disabled.
I will be using Chromium instead of Firefox (unlike the initial reddit post), as the Local Overrides feature is needed, which Firefox does not have. JShelter is also needed to detect Canvas fingerprinting.
There is a difference between the files https://web.archive.org/web/20240518230740js_/https://vf.startpage.com/sxp/i/fa4874d0f7f644dec8ad457f0db0a852.js (the original tracker) and https://vf.startpage.com/sxp/i/fa4874d0f7f644dec8ad457f0db0a852.js (the new tracker).
The difference resides in x.zr at https://web.archive.org/web/20240518230740js_/https://vf.startpage.com/sxp/i/fa4874d0f7f644dec8ad457f0db0a852.js:13:27655 and https://vf.startpage.com/sxp/i/fa4874d0f7f644dec8ad457f0db0a852.js:1:27680. In the original tracker, the property zr is defined as zr:{}, but in the current tracker, it is instead defined as zr:{"-2":0,"-52":0,"ucf":0,"-67":0}. Using Chromiumâs local overrides on https://vf.startpage.com/sxp/i/fa4874d0f7f644dec8ad457f0db0a852.js to substitute zr:{"-2":0,"-52":0,"ucf":0,"-67":0} for zr:{} will reenable Canvas fingerprinting on startpage.com.
x.zr appears to be a object controlling which fingerprinters are disabled. x.zr[-2], specifically, appears to control the Canvas fingerprinting. The other indexes likely correspond to other fingerprinting functions.
Niek is still chatting with them, but they said theyâll be able to post a response here later this week (when their tech lead returns from holiday). What theyâre describing it as is bot detection in lieu of the captchas they currently use, which they receive negative feedback about.
Iâm guessing if this script runs and youâd normally be blocked/captchaâd by their bot detection system, now youâll get a pass without any required interaction? They havenât followed up yet.
We are still functioning according to our privacy standards, meaning that we donât save or share any PII including IP address, and client side signals are only used to determine whether a given user is a bot or not at a given time.
Anyways⌠Iâm still fine with waiting to see if they will in fact join the forum here and share more technical details with all of us.
Any explanation for why they suddenly disabled the Canvas fingerprinter soon after this forum topic got posted? The timing may be a coincidence, but why did they remove only the Canvas fingerprint and nothing else, and why was there code specifically for disabling specific fingerprinters?
Only that they acknowledged it as an âerroneous configurationâ they rolled back early last week. Iâm not sure why they singled that one out as a concern specifically, hopefully they can share where theyâre actually drawing the line.
It could be a coincidence, if you believe them (no reason not to IMO) they found out about this thread for the first time on Friday
If they really are trying to reduce robot activity, they should also get rid of WebGL and Speech Synthesis. Those are just as bad as Canvas fingerprinting.
We should also question them about this. Do they mention it in their privacy policy ? Depending on wheter it is a self-hosted tool or run by Amplituted themselves, we should add a warning
Brave Search (BS) is the default search engine for many regions. However, for many, BS quality will still be vastly inferior to Googleâs. So providing a fall-back option is a way to mitigate the problem.
If you use Brave Search on another browser, you must have decided that itâs the best.
*They say that it is your browser that queries the result, however I am captcha-blocked from Google, so either they contact Google API from your browser, or do it through a third-party server.
As of me, that would be more than enough. They are doing some serious IT business and they have not noticed your mail? Yeah, sure
Câmon, its straightout lie.
Just the opposite in fact. Look: they added (intyentionally or not; does not matter here) fingerprinting, than they ignored user input and now, when cmmunity got vocal, they rolled this back and instead of admiting wrongdoing, they just lie.
There are more than enough reasons not to believe StartPage. In fact, that should be their end.
Comment rewritten because I misread yours earlier.
They did this change before Niek first contacted them, but after this forum topic got created. They would need somebody specifically monitoring this forum to have known about any vocal community at that time. They also apparently had a history of not responding to community feedback, so the three day delay adds up.
They didnât actually roll it back, they just disabled 4 of the ~70 fingerprinters. It was in fact intentional, as they claimed it was for their captcha. I doubt the person that claimed that was the same one who made the decision to put it in. Whether or not it was actually for the captcha and not for tracking users is a separate concern that only the individual who made that decision would know.
Startpageâs actions with the tracker seem pretty suspect, but that doesnât mean the employees contacting Niek are involved.
One of our users alerted us to this thread on Friday through our feedback form and we sent over some notes to Jonah & Niek. Reposting them with additional detail.
The most frequent complaint we receive from users, especially privacy-minded ones, is that we block or captcha too many real users, and that the #1 thing we could do to improve their experience is to fix this.
Every day we see millions of bots attempting to crawl our site (even more now that people are trying to train AI models), and we often are subject to DDoS attacks as well. This results in massive expense and risk. We have always had some bot detection in place, but especially in a space where we donât have any idea if someone is a new or returning user (since we donât track IP or drop cookies), real and fraudulent users can look very similar (e.g. those using a free VPN).
In response to this user input, we have begun to implement more sophisticated methodology for bot detection that still honors our privacy policy (which is a very hard problem to solve). Historically we have only had a few signals to determine botlike activity, like country or user agent. Now we are exploring using client-side data to improve the precision of these determinations. We have access to known bot patterns that we are comparing to client information in real time, in order to determine if the current search is being executed by a bot.
Some things we arenât doing:
saving or sharing PII including IP address
storing the search query
associating client information with PII data or session information
saving client information to be used for any purpose other than bot detection
loading any 3rd party assets
As we explore these detection tools weâre trying to find the right balance of signals to perform an effective analysis without over-collecting. For example, we released a handful of signals on the 16th, but rolled some back on the 23rd after they were determined to be unhelpful for the context in which they had been deployed. Obviously from a privacy perspective we would prefer not to need any client signals at all, but on the other hand, we have received thousands of notes from users arguing that constantly needing to solve captchas or reaching out to us to be unblocked also exposes them to additional scrutiny and undermines their privacy.
Noting that we have an extremely small team and may not monitor this forum on a regular basis, but if there are follow-ups or ideas feel free to reach out to our Support team.
Hello, Startpage. Can you roll back some more signals? There are still some signals like WebGL and Speech Synthesis (and exact time info, although thatâs not as bad) that I do not believe are beneficial for bot detection, but are immensely helpful for user tracking purposes.
(I also sent an email to Startpage Support)
Well Iâm glad they responded. Frankly I know nothing about this fingerprinting stuff, itâs way out of my league. What is the consensus here, are we dropping startpage? Iâve been using them for awhile so idk what to think.
For now it mostly seems like a misunderstanding and that delisting is not needed as off now.
Startpage is juggling the problem of keeping their service usable by vpn users in a privacy respecting manner, while also not letting bots/scrapers abuse their service.
They do this by performing certain fingerprinting techniques to look for common signs that a user is actually a bot.
While I could understand that Startpage may not be willing to share which values they are looking for ( as that would tip attackers/bot makers off on what do to to subvert the block), Startpage would have done well to maybe inform users about these types of changes to their site.
I would still like to hear from @spsupport what current fingerprinting techniques they employ, what information is saved, and if they have any future plans to expand on this.
P.s. Welcome to our forum, we are very glad to have you here, and so is our community!
Well, 50/50⌠we dont know for sure. Though I highly doubt that whoever Niek was contacted by, did it (contacted Niek) without anyone from management (his boss, or higher) knowing of this contact.
I highly doubt that those employees were involved. That would be too big of a conspiracy to veil, and somebody would have pulled the whistle. I couldnât find anything about any such incident.
The only remaining realistic option (the other ones got crossed out already) for malicious intent would be a high-ranking actor exploiting the preexisting situation, but if thatâs the case, I doubt their plan would work now, since the employees are claiming they are getting rid of the metrics that arenât good for bot detection, and the majority of their audience is probably using uBlock Origin.
Startpage just needs to get rid of the major fingerprint metrics (WebGL, Speech Synthesis, exact time info, browser plugins), and then all of this is over. The rest donât look sufficient for ad tracking purposes.
Perhaps. Not all people use those though. Firefox, for example, is recommended by Privacy Guides, and that browser doesnât have the features you are describing. (Resist Fingerprinting setting can help get a common fingerprint, but that doesnât work on my computer. Randomized fingerprints need 3 extensions to pull off on regular Firefox, and the average person almost certainly doesnât know which ones)