I see the Arkenfox Wiki draws a distinction between “naive” and “advanced” fingerprinting. It says that only Tor Browser (and just maybe Mullvad Browser these days?) can fool “advanced” scripts.
The page goes on to say (rather reassuringly!):
So if a fingerprinting script should run, it would need to be universal or widespread (i.e it uses the exact same canvas, audio and webgl tests among others - most aren’t), shared by a data broker (most aren’t), not be naive (most are) and not be just first party or used solely for bot detection and fraud prevention (most probably are)
Thorin seems to know what he’s talking about, but the page was last updated in November 2023.
Is it still true to say that most websites are using only naive fingerprinting? Are there any recent publications or surveys with research/data on this?
Websites are mostly par of rational market logic: if it’s easy and cheap, they will implement it.
So beating the cheap and easy is always the biggest win, if you only care about defeating mass surveillance and tracking. (it’s different if you’re up against a targeted attack from a state actor, to talk about a different threat model)
I don’t think the situation has evolved dramatically in a year, but I remember reading some papers about it a few years back. I haven’t checked more recently.
You may want to try see what’s up in the PET (Privacy Enhancing Technology) academic scene: Google Scholar
Thanks, that is a great resource for more information.
So far, I’ve read this paper which appears to do a good job of analysing the prevalence of fingerprinting. (I’m obviously no expert or I wouldn’t be asking about this in the first place, but the paper seems to be talking sense based on my own general technical knowledge and what I’ve picked up over the years about fingerprinting.)
It also starts with a nice description of how fingerprinting works. I did already know this, but it was reassuring to see it explained in a straightforward way without any associated scaremongering or with details omitted to make it more intelligible to the average non-technical reader.
The paper does seem to make an assumption that fingerprinting is done using Javascript, but that’s probably fair enough, especially for mass surveillance style fingerprinting.
The paper is dated 2024 but the crawl the analysis is based on took place in March/April 2023 - still, I guess this is fairly recent.
I think the examples in appendix A support the idea that most fingerprinting is naive, but there is no explicit discussion of this aspect. The fact that the scripts take some attributes and hash them without attempting to do any bucketing or checking would suggest that by the time the server side sees the hash, it is too late to try to do any clever stuff to remove random noise added by a browser’s attempts to defeat naive scripts.
Added rather than posting as a separate reply the next day: For anyone with a general interest in fingerprinting who isn’t an expert, this paper provides what seems to be a reasonable overview of fingerprinting techniques and defenses. It does appear to focus exclusively on “naive” scripts as far as I can see, though.
(I have a vague fuzzy idea as to how “advanced” scripts might be able to detect and work around randomization, but I don’t recall ever seeing a concrete example written down and explained.)