StartPage has apparently started to fingerprint users

@Niek-de-Wilde @jonah
EDIT: Startpage has provided a reason for why this had happened. The response that jonah gave makes it appear as if they are using a product from a third-party (Amplitude maybe?), and this would also explain the content of this post. I have removed the sections of this post claiming malicious intent.

The Canvas fingerprinting along with a few others appear to be specifically disabled.

I will be using Chromium instead of Firefox (unlike the initial reddit post), as the Local Overrides feature is needed, which Firefox does not have. JShelter is also needed to detect Canvas fingerprinting.

There is a difference between the files https://web.archive.org/web/20240518230740js_/https://vf.startpage.com/sxp/i/fa4874d0f7f644dec8ad457f0db0a852.js (the original tracker) and https://vf.startpage.com/sxp/i/fa4874d0f7f644dec8ad457f0db0a852.js (the new tracker).

The difference resides in x.zr at https://web.archive.org/web/20240518230740js_/https://vf.startpage.com/sxp/i/fa4874d0f7f644dec8ad457f0db0a852.js:13:27655 and https://vf.startpage.com/sxp/i/fa4874d0f7f644dec8ad457f0db0a852.js:1:27680. In the original tracker, the property zr is defined as zr:{}, but in the current tracker, it is instead defined as zr:{"-2":0,"-52":0,"ucf":0,"-67":0}. Using Chromium’s local overrides on https://vf.startpage.com/sxp/i/fa4874d0f7f644dec8ad457f0db0a852.js to substitute zr:{"-2":0,"-52":0,"ucf":0,"-67":0} for zr:{} will reenable Canvas fingerprinting on startpage.com.

x.zr appears to be a object controlling which fingerprinters are disabled. x.zr[-2], specifically, appears to control the Canvas fingerprinting. The other indexes likely correspond to other fingerprinting functions.

1 Like

Niek is still chatting with them, but they said they’ll be able to post a response here later this week (when their tech lead returns from holiday). What they’re describing it as is bot detection in lieu of the captchas they currently use, which they receive negative feedback about.

I’m guessing if this script runs and you’d normally be blocked/captcha’d by their bot detection system, now you’ll get a pass without any required interaction? They haven’t followed up yet.

We are still functioning according to our privacy standards, meaning that we don’t save or share any PII including IP address, and client side signals are only used to determine whether a given user is a bot or not at a given time.

Anyways… I’m still fine with waiting to see if they will in fact join the forum here and share more technical details with all of us.

3 Likes

Any explanation for why they suddenly disabled the Canvas fingerprinter soon after this forum topic got posted? The timing may be a coincidence, but why did they remove only the Canvas fingerprint and nothing else, and why was there code specifically for disabling specific fingerprinters?

Only that they acknowledged it as an “erroneous configuration” they rolled back early last week. I’m not sure why they singled that one out as a concern specifically, hopefully they can share where they’re actually drawing the line.

It could be a coincidence, if you believe them (no reason not to IMO) they found out about this thread for the first time on Friday :man_shrugging:

1 Like

If they really are trying to reduce robot activity, they should also get rid of WebGL and Speech Synthesis. Those are just as bad as Canvas fingerprinting.

1 Like

We should also question them about this. Do they mention it in their privacy policy ? Depending on wheter it is a self-hosted tool or run by Amplituted themselves, we should add a warning

1 Like

This might be for two reasons :

  1. Cost reasons and Google API limit.*
  2. Brave Search (BS) is the default search engine for many regions. However, for many, BS quality will still be vastly inferior to Google’s. So providing a fall-back option is a way to mitigate the problem.

If you use Brave Search on another browser, you must have decided that it’s the best.

*They say that it is your browser that queries the result, however I am captcha-blocked from Google, so either they contact Google API from your browser, or do it through a third-party server.

As of me, that would be more than enough. They are doing some serious IT business and they have not noticed your mail? Yeah, sure :wink:

C’mon, its straightout lie.

Just the opposite in fact. Look: they added (intyentionally or not; does not matter here) fingerprinting, than they ignored user input and now, when cmmunity got vocal, they rolled this back and instead of admiting wrongdoing, they just lie.

There are more than enough reasons not to believe StartPage. In fact, that should be their end.

Comment rewritten because I misread yours earlier.

They did this change before Niek first contacted them, but after this forum topic got created. They would need somebody specifically monitoring this forum to have known about any vocal community at that time. They also apparently had a history of not responding to community feedback, so the three day delay adds up.

They didn’t actually roll it back, they just disabled 4 of the ~70 fingerprinters. It was in fact intentional, as they claimed it was for their captcha. I doubt the person that claimed that was the same one who made the decision to put it in. Whether or not it was actually for the captcha and not for tracking users is a separate concern that only the individual who made that decision would know.

Startpage’s actions with the tracker seem pretty suspect, but that doesn’t mean the employees contacting Niek are involved.

They do mention Amplitude in their Privacy Policy, as their anonymous analytics tool.

1 Like

Hi from Startpage:

One of our users alerted us to this thread on Friday through our feedback form and we sent over some notes to Jonah & Niek. Reposting them with additional detail.

The most frequent complaint we receive from users, especially privacy-minded ones, is that we block or captcha too many real users, and that the #1 thing we could do to improve their experience is to fix this.

Every day we see millions of bots attempting to crawl our site (even more now that people are trying to train AI models), and we often are subject to DDoS attacks as well. This results in massive expense and risk. We have always had some bot detection in place, but especially in a space where we don’t have any idea if someone is a new or returning user (since we don’t track IP or drop cookies), real and fraudulent users can look very similar (e.g. those using a free VPN).

In response to this user input, we have begun to implement more sophisticated methodology for bot detection that still honors our privacy policy (which is a very hard problem to solve). Historically we have only had a few signals to determine botlike activity, like country or user agent. Now we are exploring using client-side data to improve the precision of these determinations. We have access to known bot patterns that we are comparing to client information in real time, in order to determine if the current search is being executed by a bot.

Some things we aren’t doing:

  • saving or sharing PII including IP address
  • storing the search query
  • associating client information with PII data or session information
  • saving client information to be used for any purpose other than bot detection
  • loading any 3rd party assets

As we explore these detection tools we’re trying to find the right balance of signals to perform an effective analysis without over-collecting. For example, we released a handful of signals on the 16th, but rolled some back on the 23rd after they were determined to be unhelpful for the context in which they had been deployed. Obviously from a privacy perspective we would prefer not to need any client signals at all, but on the other hand, we have received thousands of notes from users arguing that constantly needing to solve captchas or reaching out to us to be unblocked also exposes them to additional scrutiny and undermines their privacy.

Noting that we have an extremely small team and may not monitor this forum on a regular basis, but if there are follow-ups or ideas feel free to reach out to our Support team.

10 Likes

Hello, Startpage. Can you roll back some more signals? There are still some signals like WebGL and Speech Synthesis (and exact time info, although that’s not as bad) that I do not believe are beneficial for bot detection, but are immensely helpful for user tracking purposes.
(I also sent an email to Startpage Support)

Well I’m glad they responded. Frankly I know nothing about this fingerprinting stuff, it’s way out of my league. What is the consensus here, are we dropping startpage? I’ve been using them for awhile so idk what to think.

1 Like

For now it mostly seems like a misunderstanding and that delisting is not needed as off now.

Startpage is juggling the problem of keeping their service usable by vpn users in a privacy respecting manner, while also not letting bots/scrapers abuse their service.

They do this by performing certain fingerprinting techniques to look for common signs that a user is actually a bot.

While I could understand that Startpage may not be willing to share which values they are looking for ( as that would tip attackers/bot makers off on what do to to subvert the block), Startpage would have done well to maybe inform users about these types of changes to their site.

I would still like to hear from @spsupport what current fingerprinting techniques they employ, what information is saved, and if they have any future plans to expand on this.

P.s. Welcome to our forum, we are very glad to have you here, and so is our community!

7 Likes

Yeah that is what I was thinking.

Also AFAIK this fingerprinting is just for bot purposes and stays strictly internal to StartPage?

I’m also using CanvasBlocker if that helps.

Well, 50/50… we dont know for sure. Though I highly doubt that whoever Niek was contacted by, did it (contacted Niek) without anyone from management (his boss, or higher) knowing of this contact.

I highly doubt that those employees were involved. That would be too big of a conspiracy to veil, and somebody would have pulled the whistle. I couldn’t find anything about any such incident.

The only remaining realistic option (the other ones got crossed out already) for malicious intent would be a high-ranking actor exploiting the preexisting situation, but if that’s the case, I doubt their plan would work now, since the employees are claiming they are getting rid of the metrics that aren’t good for bot detection, and the majority of their audience is probably using uBlock Origin.

Startpage just needs to get rid of the major fingerprint metrics (WebGL, Speech Synthesis, exact time info, browser plugins), and then all of this is over. The rest don’t look sufficient for ad tracking purposes.

who cares what metrics they collect, if you’re using a browser like brave/mullvad/tor then it doesn’t matter, that is literally their entire point

Perhaps. Not all people use those though. Firefox, for example, is recommended by Privacy Guides, and that browser doesn’t have the features you are describing. (Resist Fingerprinting setting can help get a common fingerprint, but that doesn’t work on my computer. Randomized fingerprints need 3 extensions to pull off on regular Firefox, and the average person almost certainly doesn’t know which ones)

1 Like

wtf