Need privacy-focused file formats

(I was about to reply to this thread, and realized I was really starting a new topic.)

Removing file metadata is much harder than it should be. Granted, there are a ton of file formats and even a given format can change over time. But I think developers and maintainers of these formats need to do more to isolate metadata information and support simple, one-shot methods for removing it without undermining the integrity of the file (as in, it should work just fine without it). Similarly, it should be trivially easy to view the metadata.

I know this is easier said than done. Some files are weird hybrids - like a PowerPoint that contains videos, images, etc - each of which may have their own embedded metadata. But I also believe we can get there and we should start working it out.

Of course, we do have some tools that are good - but not perfect. Dangerzone can be used for this, too, but it’s lossy and complicated.

I also know that you can ā€˜fingerpint’ files… but that’s just because our file formats (particularly PDFs and Office XML) have gotten ridiculously complicated, and so can include lots of clues about its owner/origin.

Maybe what we need instead are a set of ā€œcleanā€, well-defined base formats… like converting a Word doc to markdown. That is, some well-known file formats that are inherently free of metadata. We may have to define these… but fine, let’s do that.

4 Likes

I agree. This is another case for practising minimalism. We should consider using markdown or other plain text formats when possible.

1 Like

Standards-compliant markdown is pretty limited when it comes to document formatting, which is an important part of word. You can’t center text, for example. Or specify fonts. Or… well, comparing Markdown to a Word doc doesn’t really seem to be a fitful equivalence in my opinon.

I do agree that having a simpler, markdown-like document format to do things similar to what Word does would be very very nice.

Bring back troff :grinning_face:

Or maybe start using TeX

Been there, done that.

Nowadays I use LibreOffice but you are correct that the embedded metadata in photos, etc. probably isn’t stripped out.