I have changed my privacy practices online a lot recently, and that’s mainly because of the rise of AI. Anyone who has used any OpenAI tech or similar tech for work or fun might understand why I feel this way.
Now massive amounts of user data can be quickly accessible, and narrowed down to YOU!
For example… let’s say Google or some other major website hooks up their Gemini/AI system to their user database. Now just type “Give me a summary of Ken’s web browsing and searching over the past week”.
Or “Tell me if Ken has any health related issues, what are they?”
Now that’s Google as an example, but I am sure governments are going to have the same thing soon if they don’t already. So, if you’re going to use all these services where you are in their databases, then AI can now reveal everything about you, super fast, and super easily.
Am I the only one freaked out by this latest development?
It might seem strange for me to say this in the privacy focus community. I am not freaking out by AI. I think the gain outweighs the loss.
While you could limit or go private by using privacy tools, E2EE, or self-host staffs, for example. You can’t replicate what AI has offered these days. It has become an important part of my workflow by now.
I don’t think just asking the AI some questions would make you less private. At least, not more than it already is. If you want to go completely private, the amount of effort needed is a lot more than just the disappearance of AI.
For me, it’s more about how all the data about you is easily accessible now, and can be narrowed down so easily to you. Before, that was a bit more complicated to do. But maybe I’m wrong…
Yes, it’s true. But the introduction of AI can only make the process faster. You wouldn’t get privacy with all your data published online, whether AI exist. Search engines have evolved a lot, and could do/find the same thing as AI.
My point is, since you can’t avoid its existence, it’s better to handle it in every way you can, depending on your threat model.
Not at all, I think most people (myself included) are quite concerned about how this market will develop, and what the negative consequences will be (potential further harm to privacy is just one of many potential risks, if the industry is allowed to develop without checks and balances). I don’t trust the big tech conglomerates to develop AI in a responsible way any more than I trust them to develop web services, social networks, etc, etc.
Basically, I think AI does pose a threat to privacy, but not a fundamentally different risk than the things that came before it. Trying to characterize AI as good or bad, is like trying to characterize The Internet as good or bad, in the early 90s.
Like what for example? In my eyes this is a problem that predates AI by some years. It is a problem inherent to both Surveillance Capitalism (big tech systems of surveillance & data harvesting) and Dragnet Government Surveillance. AI could potentially exacerbate this problem in various ways, but currently, I don’t see real world examples of AI doing this. Before big tech was scraping the internet to train AI models, they were indexing the internet for search engines, hoovering up as much personal and private info as they could to track and target individuals etc.
Essentially my 2c is that the technology is evolving but the bad behavior of the major companies involved, and the threats to user privacy are mostly the same as they were pre-AI.
It is understandable why anyone would want to wield some of its power. It is seductive and the allure is strong. This is probably one of the few moments that could propel oneself forward further. One must not be left behind when your peers are either using ChatGPT/Gemini/Copilot/etc.
As usual, us nerds like us wants to play with it but only in the confines of a safe self hosted sandbox: This needs a powerful enterprise GPU with a CPU to match. It pains me that Nvidia is the top of the game here. I hate that it is not open source friendly as AMD/Radeon. I think they’re playing catch up.
Definitely eyeing it now. If only I wasnt so bogged down in real life/professional life. I should probably dedicate all my free time on a certain day of the week just to do this.
Tbh, I am most likely in the same boat. I think we also need to consider if AI can help us. For instance, imagine about F-Droid has some kind of Sophisticated AI which can inspect codes for any privacy or security problems, or a PG chatbot which can help people understand their threat model or find solutions.
I wanna give one last example. So many things were talked about the negative impact of chatbots on university assignments. Yet, some professors may find innovative ways. We need to think in a similar vein.
I think In terms of privacy is sort of the same. I already assumed big tech and intelligence agencies could get that kind of insight easily.
My biggest concern about AI is not regarding privacy, but regarding access to information and human knowledge. AI generated content can (and is) polluting the internet with bullshit content. All that bullshit content can bury the actual content under it, making it unfindable. Also, makes it harder to trust anything you see online, so even if you are lucky enough to actually be reading human produced factual information, you might disregard it as AI generated crap. Search can be rendered unusable, which is already happening, bad actors could overwhelm Wikipedia moderation up to a point when you either can’t even trust Wikipedia, or they are forced to lock all contributions to only some trusted people, at which point, it sort of loses its value.
We already had a disinformation and fake news problem, where it’s easy to stumble upon disinformation. AI can make it so that finding accurate information is nearly impossible.
Good point. I do think it’s extremely weird that we are OK with using something that decides to spew out things that are completely wrong with no sources, just because it’s right maybe 60% of the time?
If they weren’t stealing content then they would be comfortable with linking or showing the source where they got the information. If they had started out showing the information source (or letting AI say something like “I don’t know”), then this wouldn’t have been as big an issue because now the website sources themselves are polluted by AI and it’s too late!
Tbh, that totally depends on the implementation of the models. For instance, I use perplexity instead of search engines most of the time, and I believe it brings better results with sources. Like other tech, AI is not necessarily good or bad, rather the people implementing them. We need to improve it more.
This is the part that I think gets left out of the conversation very often.
I personally do see a lot of risk of a very degraded browsing experience, and difficulty finding authentic, thoughtful or informed information online, resulting in part from AI. That is a real risk for people to be concerned about. The part I think gets left out is that is already the case to a large extent, independent from AI. I imagine the same unscrupulous content mills and SEO gaming corporations that have already made a business out of churning out dozens of SEO optimized crap articles and listicles, written by real humans, but humans that that have no expertise and are expected to be essentially a human assembly line of content, will pivot to using AI for this purpose. AI might exacerbate this, but it isn’t anything new. Finding reliable and accurate information online, finding real sources, has been becoming harder and harder long before we could blame AI.
People point to AI not always being right or correct, as if that point is some huge fatal flaw. It is certainly not ideal, but people are comparing to an imagined hypothetical where the pre AI internet was mostly full of accurate, balanced, technically correct info. Its like there is a temporary amnesia towards the quality of most human generated content on social media, SEO spam, etc. AI will get a lot of things wrong, or mis-summarize things, that is inevitable, but humans also get a ton of things wrong (one study found that 20% of left leaning and 40% of right leaning posts made on facebook were factually inaccurate), and anyone who’s spent any time on reddit knows it can be an amazing resource for finding solutions or interesting things but is also a huge source of misinformation and bias. People, even majorities of peopel are wrong all the time. We are the ones training AI, we are training it on our own content, we can’t expect it to be flawless, considering how absolutely flawed we are , and how flawed much of the training data is.
Good point. I do think it’s extremely weird that we are OK with using something that decides to spew out things that are completely wrong with no sources, just because it’s right maybe 60% of the time?
I don’t think very many people do this though. Maybe its just because I am just barely beginning to experiment with using LLMs but I treat it more like having a conversation with someone I don’t know well, but who seems to know what they are talking about. I don’t trust it to be right, but I want to hear what it has to say, and I’ll usually only use it as a first step, or a supplement to other research. I ask follow up questions if I don’t think it gave a good or correct answer, and I ask it for sources when relevant. Basically I treat it like I am talking to someone on reddit. Neither AI, nor search engines, nor social media, (nor people) should be assumed to only give accurate info. The main fear I have with AI, that i share with you, is that AI abstracts the information further from the original source, if people trust it uncritically that could be a big problem. But that is a human problem as much or more as it is a technical problem.
There are many big and small risks to AI, but I wish we would weigh those risks in the context of the world that exists, which already is full of misinfo and gullible humans who easily believe it, rather than comparing AI to an imagined world where the internet is full of subject matter experts sharing objectively true information only on topics they are qualified to write about. (not saying that that is what you are doing (I don’t think you did that), this last paragraph is just my reflections on the totality of conversations I’ve had and read on the subject)
Brave just released a research paper on an open source method that can be used to find out if an Ai was trained on private data. This will allow for a lot more understanding and transparency of what these LLM’s really know
What you are talking about is essentially just aggregation of existing sources in a convenient way. This is not something new (e.g. XKeyscore).
What I’m worried about is the ability of creating accurate personality models of users. While surveillance capitalists have long been in the business of stringing together profiles of every human in their reach via surveillance and inference of missing data points, i fear that LLMs can bring that to a whole new level of distopia. I know people who use ChatGPT as a therapist. They are directly and indirectly revealing their most intimate thoughts and feelings to a technology, that can conveniently make sense of that information in an fully automated fashion.
While I have no direct proof yet, looking at the current state of OpenAI, it seems quite naive to assume they are not using user data (for more than just training). They have preferential server deals with Microsoft, paying just 1/3 of market costs, and still are loosing 5 billion a year. Without subsidized server bills, they would likely make losses in the range of 10 to 15 billion a year. Yes, you read that right, billions. They need to aggressively claw back any little drop of value from users in any way possible.
I think i would make sense to establish some forum rules about AI. The term “AI” ranges from being unhelpfully ambiguous to actively misleading. It serves the big tech narratives that AGI is “just around the corner” and that generative AI tools like language models (ChatGPT) or diffusion models (Midjourney) posses human like intelligence (they do not).
I would make much more sense to talk about the actual technologies involved (mostly LLMs at this point, or neural networks more generally) instead of the hype word “AI”.
Usage Data: We may automatically collect information about your use of the Services, such as the types of content that you view or engage with, the features you use and the actions you take, as well as your time zone, country, the dates and times of access, user agent and version, type of computer or mobile device, and your computer connection.
Their messages to ChatGPT are literally being harvested for data training.
Agreed. People, in their stupidity, just give away personal details without even realizing, unfortunately. It’s saddening to think about.
Similarly to that, there was a data breach that exposed horrendous things which I won’t bother describing.
Exactly, that is the known part. It is assumed, that model training refers to training the next big model (GPT5) or refining the current one (RLHF & fine tuning). This is already problematic of course, as your private data could end up in a general purpose model accessible to all.
Also usage data is collected, which is quite “normal” these days.
But seeing the recent inclusions into OpenAI (ex NSA and ex Palantir), i wonder what else they might do with the data. They could use a psychological personality framework (e.g. Big Five) or something like that as classification scheme, and then train micro models on each user (a mini LLM that tries to destill a user’s personality traits). If this is done successful, the resulting user model would be much more powerful than user profiles strung together by data brokers. Manipulating users would be vastly more effective.