The entire concept hierarchy and discourse between the concepts of privacy, security, and anonymity has been a mess for years.
E.g. This video by The New Oil talks about Signal being private
because it protects your messages.
Privacy Guides own material defines privacy in similar way.
This video by Switched to Linux talks about Security being use of SSL.
This video by the Hated One describes Privacy as something that “hides your activity”, and it states Tor is “secure” because Onion Services are E2EE.
Let’s start with the definitions:
-
Privacy is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively. (Wikipedia)
-
Security is protection from, or resilience against, potential harm.(Wikipedia)
-
Anonymity describes situations where the acting person’s identity is unknown. (Wikipedia)
An individual who wants to have privacy is facing a threat of losing their privacy, because trying to obtain privacy is a small form of power grab. Those with power to know everything about you will want to retain that power. They do not want the cyperpunk dream of “privacy for the individual, transparency for the powerful” to be realized.
Thus, to have privacy, you need to have security from threats going after your privacy. To have security, you need to be safe from your threat model which includes someone breaching your privacy. Thus, the definition of security and privacy are different. But you can’t have one without the other.
My threat model for security may include other aspects, such as being mugged. Having privacy over the amount of cash I have in my wallet does not protect me from someone mugging me hoping to get lucky. So security is a higher level category than privacy. Some defense against mugging can be found from fast legs and if needed, martial arts.
When we consider anonymity, we’ll realize its definition is not as high level as the others. If I’m anonymous, it doesn’t mean I’m safe from the threat of someone breaching my privacy by looking at the content of my messages, which might contain information that’s damaging to me without the attacker ever learning who I am.
Example: The content of my message might contain the location of my secret cash stash I shared with my spouse “In case something happens to me”. If someone else reads the message and empties the stash, my financial security is lost without my anonymity being compromised.
Digital world security
In the information security concept hierarchy we usually run into these two triads:
CIA triad 1: Confidentiality, Integrity, Availability (computer security)
CIA triad 2: Confidentiality, Integrity, Authenticity (cryptography)
The triad 2 is a set of properties, that cryptographic protocols like the ones Signal and Tor must provide.
The triad 1 is the one that’s relevant at this level. Confidentiality is achieved with more tools than just encryption. Your OS password provides some level of security against someone accessing your files. It’s of course not much without encryption of your home dir, but it shows they’re not the same, and that confidentiality is usually enforced with cryptography.
Integrity is another shared term, but protection against errors in transfer don’t require cryptographic hash functions, a CRC checksum might be enough. Just because cryptography handles this better, doesn’t make the terms the same.
Availability is the one missing from triad 2, it’s the best example to show computer security is wider concept than privacy. Not losing your data has very little to do with privacy, aside the natural need to ensure the privacy of the backups.
Thus, we know privacy is only a subset of computer security.
Types of privacy in digital domain
As per above, the top level category is computer security. Below it, is the digital world privacy which is what PrivacyGuides is about. In this world, all information we exchange online is going in, or going out of our devices. That information has two parts, the content, and the metadata.
The content, or payload, is the data being transmitted. Raw binary data that can be interpreted as text, image, video, other depending on case.
The metadata is defined as data about data. It contains stuff on multiple levels
- TCP packet headers containing among other things
- source/destination IP
- source/destination port
- The application level
- source/destination call signs (email/XMPP-address etc.)
- Content level
- length of actual message
- time the message was actually sent (high latency mixnets)
Thus, for computer networking, we can split privacy into two top level categories, content privacy and metadata privacy.
Content privacy
Content privacy is a 1-to-1 match with the concept of confidentiality, information security uses in their CIA triad. We get confidentiality by utilizing encryption. Because encryption is only a mechanism that converts a confidentiality problem into a key management problem (a third party that has the key has access to the plaintexts), strong content privacy requires only the intended parties (sender and recipient(s)) to have access to the key. This is called end-to-end encryption. End-to-end encryption has several forms
- TLS encryption between your browser and your bank. You are the first party. Your bank is the second party. Everyone else is the third party.
- Signal protocol between you and your buddy(s). You are the first party. Your buddy(s) the second, and everyone else, including the Signal server is the third party.
- Client-side encrypted connection to cloud storage. You are the first party (who creates a backup) and the “second party” (who fetches the backup) and everyone else, including the cloud storage provider is a third party.
Metadata privacy
Metadata privacy means selective disclosure of metadata about the communication to third parties. It contains things like
- Identities of people communicating
- Date and time of sent communications
- Amount of communication sent
- Hierarchy of group (who talks to who, in what order etc.)
- Type of data
Anonymity
Guess Who is a board game where the players ask questions about different characteristics in a group of people. This information allows players to eliminate candidates that didn’t match information, with the ultimate goal to narrow down a single individual and thus win the game.
Metadata and content both leak this kind of information. Slowly, but surely, information will eventually reveal enough information to filter out all other people on the planet, effectively deanonymizing you.
Thus, anonymity can not exist without sufficient content privacy, and metadata privacy.
Anonymity online is obtained by a collection of tools, not by a single tool. Your anonymity will break if you reveal enough about yourself and someone puts the pieces together
- Information about your real life
- Web tracking elements
- Callsigns, IP-addresses
Within each of these, the adversary’s capabilities will vary. E.g. in the case of IP addresses, VPN will protect you from a nobody individual or a commercial entity. It will not protect you from a government. Tor will protect you from many authoritarian governments, but not from global passive adversaries like FVEY.
Quantity/Schedule information
This information also tells a lot. When Russia gave the call to attack Ukraine, there was a LOT of radio communication all of a sudden. Even if it was encrypted, it’s not exactly unclear what was about to happen.
The normal way to protect this information is something called traffic flow confidentiality, where all devices are transmitting encrypted noise packets at even rate, and the fact there’s actual message inside a ciphertext is only available to the recipient decrypting every packet.
But it’s rarely used in modern day communications. IPSEC supports it, as does more advanced military communications.
Traffic flow confidentiality can hide also the type of data, group hierarchy, and the message length. While this information might contribute somewhat to the ‘Guess Who pile of information’, because it has other use cases, such as establishing the group hierarchy, it’s not beneficial to narrow the protection to be only aimed at providing anonymity.
So from this, we can clearly see anonymity is a subset of metadata privacy.
While we could put content privacy under metadata privacy because you can’t have metadata privacy without content privacy, it’s useful for practical purposes to place the two into equal footing.
It ensures people have sane terminology when they need to make the everyday distinction of “privacy vs anonymity”, or rather, what they mean by it. To them, it means “end-to-end encrypted vs anonymous”, “Signal vs Cwtch”, “confidentiality vs anonymity”.
From the existence of traffic flow confidentiality, we saw anonymity doesn’t protect all metadata. And confidentiality is again the same as content-privacy. So “Content private vs metadata private” is the proper distinction.
If you’re having trouble getting the big picture, this should clear it a bit
In the graph, the arrow points to the solution.
With a proper hierarchy for the terms, we can argue which application provides best overall privacy. Which one provides best metadata privacy. How is anonymity provided by the tool. Does it have traffic flow confidentiality etc.