Why the Term 'Metadata' Is not Helpful in Discussions Around Privacy

So this article proposes “behavioral surplus” should be used in place of metadata, because metadata doesn’t seem as threatening, and because metadata has a purpose outside of the surveillance realm. Also, Meta being the name of the Facebook company could confuse some people.
Personally, I think that behavioral surplus is a bit too wordy, but what do you think? Could PG consider using the term, would you consider using it around others?

Metadata sounds abstract and harmless

People seem to have misunderstood what meta means. A meta research paper is a paper about research papers. Meta video game like Nier is a video game about video games. Metadebate is debate about the debate, and like Bruce Schneier said: “Metadata is data about data”.

Replacing metadata with something like “behavioural surplus” (a term that sound like psychologist and economist stormed it, no offense) is not helpful. I understand it tries to communicate what’s being shared, and I respect the intent. But

  1. The term will require decades of translation. “Behavior what? Oh, so metadata.” Just think how many computer professionals still need to be explained what TLS is, via “it’s the new SSL”
  2. Makes it harder to discuss technical systems leaking metadata. Behavior of what? The protocol? The client?
  3. Importance of metadata dies when this article dies. Ex-NSA Chief: 'We Kill People Based on Metadata' - ABC News When we stop using the term, the article will stop appearing in search results.

Metadata sounds abstract and innocuous,

Which is why I recommend we never stop linking to the ABC News article, whenever the discussion about importance of metadata comes out.

Surplus is also euphemistic, and both words step away from “data” which public is finally associating with “the new oil”, and the whole thing sounds like bike-shedding. If we argue about what we should call metadata, we have much less time educating people about importance and aspects of metadata.

while behavioural surplus is currently being used as a privacy invasion tool for profit on a massive scale

I must admit this is the first time I’m hearing about the term. If it was Zuboff’s idea, it has been around for a while but never caught on.

The type of data about us that are scraped and sold for behaviour prediction are a byproduct of the digital interaction with a tool, like a search engine; they do not serve a clear primary purpose of helping to access the primary data

The author seems to want to separate the metadata that allows the company to exploit the user, from the data that’s just for record keeping. There isn’t much of that those. A record about you might have a running serial number as table’s primary key, but that’s not created by you. The rest is either valuable, potentially valuable, or literal waste of resources and added complexity that’s costing a lot in development time.

Like the author states “Language is key in the fight for privacy”. We must be careful about what we include. If we starting lumping all “exploitable metadata” (who gets to decide what it covers?) under behavioral surplus, then that just muddies the waters, i.e., the history of existing papers, news, and articles. Was that metadata dangerous if it wasn’t labeled as behavioral surplus? The book’s been out since 2018, at what point do we transit? Do people know how to use the term? Or will they just think it’s the new word for metadata?

Secondly, the word metadata has a light, invisible quality to it. You can easily skip it. You don’t really need to worry about it.

No it’s the opposite. I would 100% gloss over behavioral surplus as some corporate euphemism and I would not even bother to check it out.

Perhaps the renaming of Facebook as Meta, when things got too whistleblowy for the company

What FB decides to call itself should not be a problem. If anything, you can say “they’re after your metadata which is good enough for governments to make kill decisions”.

Also yeah, “WhatsApp has your metadata” isn’t a whistleblowing thing. Today, it’s what the ~whole internet thinks Google Search “Whatsapp collects metadata”.


I’m also for a better term that would immediately tell it’s data about data and that it’s very important and very revealing, but this is not it.

Personally I think it’s much more useful to just focus more attention on the importance of metadata. Teach which ones there are, how is it dangerous, and to avoid people taking the “I have nothing to hide” boat, teach them how they can actively fight it with relative ease.

“NSA kill-decision data” could be a decent one, at least you wouldn’t glance over it.

tl;dr; I’m seeing nothing good and a lot of bad with replacing metadata with “behavioral surplus”.

6 Likes

Firstly, I do think that fundamentally metadata is just a subset of the “behavioral surplus” the author is referring to.

I don’t think the problem that the author has is that metadata is not a good term, even though that’s the problem the author claims to have. The actual problem here is that some(!) people misuse the term “metadata” to describe things other than metadata.

:+1:

The best solution here, in my opinion, is to just call the data being collected what it is: your personal data. There’s nothing meta about most of it in the first place, it’s just new raw data. And then, only use the term “metadata” when exclusively referring to actual metadata.

3 Likes

Thanks for your comments! I may have to change the ending a little if it wasn’t clear that I’m not really trying to promote behavioural surplus as a good replacement term, I’m mainly trying to argue that the term metadata is used incorrectly, which makes it even harder to talk about what is really going on with the scraping and sale of our interactions with platforms and tools.

2 Likes

Thanks for the succinct summary; I feel you’ve understood my point, and you make a good observation about metadata being a subset of what we are talking about.

‘Personal data’ may also muddy the waters of the discussion though, because most of my friends and family would think that refers only to your actual personal data (name, IP, search and email content, etc.) and would still not make it clear to the general public that what is valuable in creating profiles of predictability is how we interact with something like search or Gmail (number of entries per day, spelling idiosyncrasies, time to click, hover time, etc.).

While I agree with some of the criticisms of the phrase ‘behavioural surplus’, I do like that it has the word ‘behaviour’ in it.

Thanks to whoever posted my article here! Nice to discover this forum.

1 Like

I could see that potentially being the case. I might still lean towards simply “behavioral data” (in addition to “personal data”) in this case. I agree with @maqp that “data” already has the right connotation in people’s minds, so abstracting away from it muddies the waters a bit.

Especially because most people automatically assume a surplus of something is a good thing in most contexts :slight_smile:

2 Likes

How about…

PAID - Personal and Interactive Data?

:grinning_face:

1 Like

I actually like the term “metadata”! To make it work in privacy discussions though, we are going to need to modify it further with what type of metadata is being talked about.

Behavioral surplus for me seems like a sociological or STS term…where the technology itself records the circumstances of our daily actions which is then considered valuable because it was abstracted into a single data point. Otherwise, my conversation with my friends (and subsequently the circumstances about it) wouldn’t be particularly valuable if nobody was there to record and aggregate it.

Metadata has more formal application in privacy discussions when related to “communications metadata” and how law enforcement utilizes it. Although cyber law and policy is considered a new field, most academic scholars nowadays use “communications metadata” in the classroom and in research papers. It has been normalized because metadata was originally a library and information science term co-opted into this field. Academically speaking of course, I’m sure it has been applied independent of it.

There is room for both in my opinion. But feel free to disagree with me @ThePrivacyDad

3 Likes

Really cool points on both sides! Maybe one could use the term “revealing, indirect data (RID?)”?
Indirect implies it isn’t implicitly the message being sent to another, while revealing implies it still telegraphs relatively indicative information.
I feel that using this term would make it somewhat clear that it falls underneath the category of metadata, but also shows how metadata doesn’t necessary mean something to be abused for surveillance purposes.

Also, thank you PrivacyDad, for publishing the article!

3 Likes

RID is a really good idea, It implies metadata directly
maybe I’ll use the
“Metadata or as I’ve taken it to calling it recently, Revealing Indirect Data or RID”

2 Likes