Stylometry circumvention

Stylometry is studying authors style to reveal their identities.

Stylometry poses a significant privacy challenge in its ability to unmask anonymous authors or to link pseudonyms to an author’s other identities,[29] which, for example, creates difficulties for whistleblowers,[30] activists,[31] and hoaxers and fraudsters.[32] The privacy risk is expected to grow as machine learning techniques and text corpora develop.[33]

This is a complex topic, and includes so many depths not only in text analysis, but also music, paintings, and other media. But a good start might be command line to analyze basic punctuation patterns, sentence endings, and their forms. Of course this is assuming all spelling and grammatical mistakes are already eliminated.

1 Like
5 Likes

It is possible to utilize a large language model as a means of evading stylometry, employing strategies comparable to those commonly used by students to evade plagiarism.
- academic style

Yo, listen up, let me break it down
You can drop LLM to evade stylometry, don’t be a clown
Just like the kids today, they use tactics to dodge that plagiarism trap
So get with the times, and level up your game, son.
- rap style

Above quotes courtesy of Vicuna LLM stylistic reinterpretation.

Sending your text to an online service like ChatGPT that has your phone and credit card may actually work against what you’re trying to achieve. Vicuna and gpt4alpaca are just two of the many models you can download and use offline.

Be wary of websites employing (usually from a third party) invasive CAPTCHAs that fingerprint you from input events way before you finish writing something on a web form. Copy-pasting works but may flag you as a bot.

For something more serious, I’d look into the GitHub project hosted at computationalstylistics/stylo. For some theory, a book like Machine Learning Methods for Stylometry can shed more light on the topic.

5 Likes

Related: The NSA's Large Language Models - Conscious Digital

Stylometry remains a significant threat to online anonymity, over a decade after Snowden’s 2013 revelations. Locally deployable language models, like local LLaMA implementations, might offer a solution, providing users with more control over their digital footprint.

Interestingly, using cloud-based models like ChatGPT within secure environments (e.g., Qubes OS) could, in theory, mask individual stylometric fingerprints among a large user base. However, this potential benefit is likely outweighed by these services’ extensive tracking and data collection practices, rendering them unsafe for those seeking true anonymity.

2 Likes