Stylometry circumvention

avp · April 19, 2023, 7:58pm

Stylometry is studying authors style to reveal their identities.

Stylometry poses a significant privacy challenge in its ability to unmask anonymous authors or to link pseudonyms to an author’s other identities,[29] which, for example, creates difficulties for whistleblowers,[30] activists,[31] and hoaxers and fraudsters.[32] The privacy risk is expected to grow as machine learning techniques and text corpora develop.[33]

This is a complex topic, and includes so many depths not only in text analysis, but also music, paintings, and other media. But a good start might be command line to analyze basic punctuation patterns, sentence endings, and their forms. Of course this is assuming all spelling and grammatical mistakes are already eliminated.

jerm · April 20, 2023, 1:50am

wek · April 20, 2023, 11:38am

It is possible to utilize a large language model as a means of evading stylometry, employing strategies comparable to those commonly used by students to evade plagiarism.
- academic style

Yo, listen up, let me break it down
You can drop LLM to evade stylometry, don’t be a clown
Just like the kids today, they use tactics to dodge that plagiarism trap
So get with the times, and level up your game, son.
- rap style

Above quotes courtesy of Vicuna LLM stylistic reinterpretation.

Sending your text to an online service like ChatGPT that has your phone and credit card may actually work against what you’re trying to achieve. Vicuna and gpt4alpaca are just two of the many models you can download and use offline.

Be wary of websites employing (usually from a third party) invasive CAPTCHAs that fingerprint you from input events way before you finish writing something on a web form. Copy-pasting works but may flag you as a bot.

For something more serious, I’d look into the GitHub project hosted at computationalstylistics/stylo. For some theory, a book like Machine Learning Methods for Stylometry can shed more light on the topic.

avp · April 29, 2023, 2:42am

Related: The NSA's Large Language Models - Conscious Digital

protrude_subprime416 · November 8, 2024, 3:14am

Stylometry remains a significant threat to online anonymity, over a decade after Snowden’s 2013 revelations. Locally deployable language models, like local LLaMA implementations, might offer a solution, providing users with more control over their digital footprint.

Interestingly, using cloud-based models like ChatGPT within secure environments (e.g., Qubes OS) could, in theory, mask individual stylometric fingerprints among a large user base. However, this potential benefit is likely outweighed by these services’ extensive tracking and data collection practices, rendering them unsafe for those seeking true anonymity.

Topic		Replies	Views
Private and Secure Social Media Platforms for High-Risk Communities? Questions	6	395	November 11, 2024
Layering Local LLM Over Commercial LLM for Privacy? General software	3	636	September 9, 2023
Privacy implications of keystroke dynamics General browsers	1	134	August 20, 2024
Exploring the Privacy Threats of Browser Extension Fingerprinting General article , browsers	2	1122	December 18, 2022
Identity spamming tools for identity obfuscation Questions	7	468	January 21, 2025

Stylometry circumvention

Related topics