The collective misunderstanding of Privacy vs Security vs Anonymity

The entire concept hierarchy and discourse between the concepts of privacy, security, and anonymity has been a mess for years.

E.g. This video by The New Oil talks about Signal being private because it protects your messages.

Privacy Guides own material defines privacy in similar way.

This video by Switched to Linux talks about Security being use of SSL.

This video by the Hated One describes Privacy as something that “hides your activity”, and it states Tor is “secure” because Onion Services are E2EE.


Let’s start with the definitions:

  1. Privacy is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively. (Wikipedia)

  2. Security is protection from, or resilience against, potential harm.(Wikipedia)

  3. Anonymity describes situations where the acting person’s identity is unknown. (Wikipedia)

An individual who wants to have privacy is facing a threat of losing their privacy, because trying to obtain privacy is a small form of power grab. Those with power to know everything about you will want to retain that power. They do not want the cyperpunk dream of “privacy for the individual, transparency for the powerful” to be realized.

Thus, to have privacy, you need to have security from threats going after your privacy. To have security, you need to be safe from your threat model which includes someone breaching your privacy. Thus, the definition of security and privacy are different. But you can’t have one without the other.

My threat model for security may include other aspects, such as being mugged. Having privacy over the amount of cash I have in my wallet does not protect me from someone mugging me hoping to get lucky. So security is a higher level category than privacy. Some defense against mugging can be found from fast legs and if needed, martial arts.

When we consider anonymity, we’ll realize its definition is not as high level as the others. If I’m anonymous, it doesn’t mean I’m safe from the threat of someone breaching my privacy by looking at the content of my messages, which might contain information that’s damaging to me without the attacker ever learning who I am.

Example: The content of my message might contain the location of my secret cash stash I shared with my spouse “In case something happens to me”. If someone else reads the message and empties the stash, my financial security is lost without my anonymity being compromised.


Digital world security

In the information security concept hierarchy we usually run into these two triads:

CIA triad 1: Confidentiality, Integrity, Availability (computer security)
CIA triad 2: Confidentiality, Integrity, Authenticity (cryptography)

The triad 2 is a set of properties, that cryptographic protocols like the ones Signal and Tor must provide.

The triad 1 is the one that’s relevant at this level. Confidentiality is achieved with more tools than just encryption. Your OS password provides some level of security against someone accessing your files. It’s of course not much without encryption of your home dir, but it shows they’re not the same, and that confidentiality is usually enforced with cryptography.

Integrity is another shared term, but protection against errors in transfer don’t require cryptographic hash functions, a CRC checksum might be enough. Just because cryptography handles this better, doesn’t make the terms the same.

Availability is the one missing from triad 2, it’s the best example to show computer security is wider concept than privacy. Not losing your data has very little to do with privacy, aside the natural need to ensure the privacy of the backups.

Thus, we know privacy is only a subset of computer security.


Types of privacy in digital domain

As per above, the top level category is computer security. Below it, is the digital world privacy which is what PrivacyGuides is about. In this world, all information we exchange online is going in, or going out of our devices. That information has two parts, the content, and the metadata.

The content, or payload, is the data being transmitted. Raw binary data that can be interpreted as text, image, video, other depending on case.

The metadata is defined as data about data. It contains stuff on multiple levels

  1. TCP packet headers containing among other things
    • source/destination IP
    • source/destination port
  2. The application level
    • source/destination call signs (email/XMPP-address etc.)
  3. Content level
    • length of actual message
    • time the message was actually sent (high latency mixnets)

Thus, for computer networking, we can split privacy into two top level categories, content privacy and metadata privacy.

Content privacy

Content privacy is a 1-to-1 match with the concept of confidentiality, information security uses in their CIA triad. We get confidentiality by utilizing encryption. Because encryption is only a mechanism that converts a confidentiality problem into a key management problem (a third party that has the key has access to the plaintexts), strong content privacy requires only the intended parties (sender and recipient(s)) to have access to the key. This is called end-to-end encryption. End-to-end encryption has several forms

  1. TLS encryption between your browser and your bank. You are the first party. Your bank is the second party. Everyone else is the third party.
  2. Signal protocol between you and your buddy(s). You are the first party. Your buddy(s) the second, and everyone else, including the Signal server is the third party.
  3. Client-side encrypted connection to cloud storage. You are the first party (who creates a backup) and the “second party” (who fetches the backup) and everyone else, including the cloud storage provider is a third party.

Metadata privacy

Metadata privacy means selective disclosure of metadata about the communication to third parties. It contains things like

  • Identities of people communicating
  • Date and time of sent communications
  • Amount of communication sent
  • Hierarchy of group (who talks to who, in what order etc.)
  • Type of data

Anonymity

Guess Who is a board game where the players ask questions about different characteristics in a group of people. This information allows players to eliminate candidates that didn’t match information, with the ultimate goal to narrow down a single individual and thus win the game.

Metadata and content both leak this kind of information. Slowly, but surely, information will eventually reveal enough information to filter out all other people on the planet, effectively deanonymizing you.

Thus, anonymity can not exist without sufficient content privacy, and metadata privacy.

Anonymity online is obtained by a collection of tools, not by a single tool. Your anonymity will break if you reveal enough about yourself and someone puts the pieces together

  • Information about your real life
  • Web tracking elements
  • Callsigns, IP-addresses

Within each of these, the adversary’s capabilities will vary. E.g. in the case of IP addresses, VPN will protect you from a nobody individual or a commercial entity. It will not protect you from a government. Tor will protect you from many authoritarian governments, but not from global passive adversaries like FVEY.

Quantity/Schedule information

This information also tells a lot. When Russia gave the call to attack Ukraine, there was a LOT of radio communication all of a sudden. Even if it was encrypted, it’s not exactly unclear what was about to happen.

The normal way to protect this information is something called traffic flow confidentiality, where all devices are transmitting encrypted noise packets at even rate, and the fact there’s actual message inside a ciphertext is only available to the recipient decrypting every packet.

But it’s rarely used in modern day communications. IPSEC supports it, as does more advanced military communications.

Traffic flow confidentiality can hide also the type of data, group hierarchy, and the message length. While this information might contribute somewhat to the ‘Guess Who pile of information’, because it has other use cases, such as establishing the group hierarchy, it’s not beneficial to narrow the protection to be only aimed at providing anonymity.


So from this, we can clearly see anonymity is a subset of metadata privacy.

While we could put content privacy under metadata privacy because you can’t have metadata privacy without content privacy, it’s useful for practical purposes to place the two into equal footing.

It ensures people have sane terminology when they need to make the everyday distinction of “privacy vs anonymity”, or rather, what they mean by it. To them, it means “end-to-end encrypted vs anonymous”, “Signal vs Cwtch”, “confidentiality vs anonymity”.

From the existence of traffic flow confidentiality, we saw anonymity doesn’t protect all metadata. And confidentiality is again the same as content-privacy. So “Content private vs metadata private” is the proper distinction.

If you’re having trouble getting the big picture, this should clear it a bit

In the graph, the arrow points to the solution.

With a proper hierarchy for the terms, we can argue which application provides best overall privacy. Which one provides best metadata privacy. How is anonymity provided by the tool. Does it have traffic flow confidentiality etc.

6 Likes

Privacy vs Security vs Anonymity

2 Likes

Privacy is the assurance that your data is only seen by the parties you intend to view it.

So when someone says “Signal is private” that means my identity remains private to me only?

Security is the ability to trust the applications you use

This is a narrow and vague definition for an already defined word.

that the parties involved are who they say they are

This is provided by authenticity attribute of the cryptographic protocol used in a content-secure app. Security already has a definition.

security can be provided by HTTPS certificates.

My threat model includes a situation where my anonymity breaks. Is my security wrt my threat model ok with HTTPS certificates?

Anonymity is the ability to act without a persistent identifier.

So if I go war driving and use a random WiFi access point and Tor to connect to an imageboard, and tell everything about my work as the lead developer of my own application, I’m anonymous?

I’m afraid the article you linked only contributes to the mess.

1 Like

The protocol itself has no notion of “you”. “You” look different to the Signal Server and to different contacts (unless they’ve got your phone number, which is no longer a requirement). The Signal app, however, does.

Does not answer the question. What does it means when someone says “Signal is private?”

The current terminology wants that to mean it’s end-to-end encrypted. It is, but it also implies Signal is providing best practice privacy for everything. It isn’t. Cwtch has stronger notion of metadata-privacy.

Privacy is again, a wider concept. I don’t need end-to-end encryption to address the concern of my neighbors looking through my windows. That need for privacy is solved by curtains.

Surely this definition of private == E2EE isn’t to try to cater to “Signal Private Messenger” Play Store title?

It is private in as much guaranteed by the Protocol & its implementation (ex: WhatsApp v Signal).

Per the Signal Foundation, there’s minimal metadata leak, and not much in any meaningful way that compromises “identity” (ref).

Perhaps you should read the specifications themselves.

X3DH provides forward secrecy and cryptographic deniability.

X3DH doesn’t give either Alice or Bob a publishable cryptographic proof of the contents of their communication or the fact that they communicated.

Authentication … does not necessarily prevent an “identity misbinding” … when an attacker (“Charlie”) falsely presents Bob’s identity key fingerprint to Alice as his (Charlie’s) own, and then either forwards Alice’s initial message to Bob, or falsely presents Bob’s contact information as his own.

The effect of this is that Alice thinks she is sending an initial message to Charlie when she is actually sending it to Bob.

Ref: Extended Triple Diffie-Hellman.

So “makes attempts towards providing some level of privacy”. The problem is that doesn’t tell you anything about the properties of Signal. It’s also redefining privacy from end-to-end encryption again.

Content-private by design, tells you it’s end-to-end encrypted.
Metadata-private by policy, tells you it opts out from collecting metadata.

Per the Signal Foundation, there’s minimal metadata leak, and not much in any meaningful way that compromises “identity”

Yeah I’m well aware of there being six pieces of court evidence showing Signal collects only the unix timestamps of registered and last_seen. I’m talking about how we can categorize apps without using vague terms.

Metadata private by policy explains this. We do not say metadata private by design, because Signal is not designed in a way where the client protects your metadata the same way E2EE protects your content. Cwtch does that.

Perhaps you should read the specifications themselves.

I’ve done work in the private messaging world for 15 years. Three guesses if I’m more than adequately well versed with the intricacies of the protocol.

The problem is not me getting the details of individual products. The problem is, as the title says, the collective misunderstanding of terminology, and their hierarchy.

We are re-defining very high level terminology to mean things like properties of cryptographic protocols.

If privacy means E2EE protocol’s confidentiality, then we lack the terminology to talk about privacy of metadata. We also lose ability to talk about all other forms of privacy. I’m also aware of context having a role to play. There’s obviously no curtains when we talk about app security. But again, blanket statement such as “Signal is private” omits too much because private messaging takes more than just E2EE.

2 Likes

Yeah, Signal is way more than just e2ee.

No. It’s “deniability”.

Yeah, Signal is way more than just e2ee.

Exactly my point. Private messaging takes more than E2EE. Private messaging takes content privacy and metadata privacy.

If privacy means E2EE protocol’s confidentiality,

No. It’s “deniability”.

I’m sorry what, now privacy means protocol’s deniability?

Yes end-to-end encryption that uses MACs as opposed to digital signatures provides deniability. Which is metadata privacy. It protects the authorship of the message from being proven to third parties.

Signal is doing that. Which shows that it’s doing more to privacy than the definition “privacy == E2EE” allows.

To discuss properties like deniability, wouldn’t you agree we should have an umbrella term for properties like that one. May I suggest “metadata-privacy”. Would make it a lot more easy to make a chart of related properties to compare products.

1 Like

Yep (in fact, I think both of us agree almost on everything). :+1:

1 Like

I’m not alone here. The same terminology was proposed By Arvid Lunnemark here It Is Time to Move Beyond End-to-End Encryption

(They also proposed the notion of complete privacy that provides both content and metadata privacy. I’m unsure if this is necessary, given that we run into nasty problem of having to also address endpoint from SW to HW, including EMSEC. And even that might not be enough. Complete privacy is a bit dangerous and an oxymoron.)

Lunnemark also formalized metadata privacy in this paper he co-authored https://eprint.iacr.org/2022/1139.pdf

Another paper using the terminology “metadata privacy” https://pages.cs.wisc.edu/~chatterjee/papers/popets24-mohito.pdf

https://arxiv.org/pdf/2304.02810 talks about “end-to-end encryption assumptions around content privacy”

So it’s not like this terminology isn’t established to some extent already.

1 Like