So ChatGPT is keeping your chat logs forever. My thoughts

So I recently learned from this reply by @rmd that OpenAI is going to keep your ChatGPT chat logs forever. The reason? Well some time in late May or early June this year New York Times sued OpenAI and ordered them to keep all chat logs, including deleted chats just because they think ChatGPT users might prompt the AI to generate their stupid paywalled articles.

This is absolutely absurd and violates just around every single privacy law. While personal data can be kept indefinitely if it is for “compliance with a [real] legal obligation” under the General Data Protection Regulation (GDPR) and other laws with the same framework, OpenAI is being forced to retain those data all based on nothing but assumptions. ̶A̶l̶s̶o̶,̶ ̶N̶e̶w̶ ̶Y̶o̶r̶k̶ ̶T̶i̶m̶e̶s̶ ̶a̶r̶t̶i̶c̶l̶e̶s̶ ̶a̶r̶e̶ ̶a̶b̶s̶o̶l̶u̶t̶e̶ ̶b̶u̶l̶l̶s̶h̶i̶t̶ ̶a̶n̶d̶ ̶I̶ ̶d̶o̶n̶’̶t̶ ̶k̶n̶o̶w̶ ̶a̶ ̶s̶i̶n̶g̶l̶e̶ ̶p̶e̶r̶s̶o̶n̶ ̶w̶h̶o̶ ̶w̶o̶u̶l̶d̶ ̶u̶s̶e̶ ̶A̶I̶ ̶t̶o̶ ̶r̶e̶a̶d̶ ̶o̶n̶e̶.̶

Not only that, New York Times could have literally sued OpenAI and force them to delete the data trained from their articles instead, that’s completely legal, and justified. But no, instead they choose to remove the right to privacy from billions of people, because of that one hypothetical person that probably doesn’t even exist who might ask ChatGPT to summarize one of New York Times’ biased, paywalled articles.

This entire lawsuit is built on the idea that billions of people desperately want to read New York Times articles, so desperately they’d violate copyright laws to do it. Reality check: half the planet doesn’t even know what the actual fuck New York Times is, and the other half barely cares. Furthermore, why would people ask ChatGPT that probably doesn’t even have access to New York Times articles when there’s hundreds of other tools out there that lets you read New York Times articles for free.

So why do you think New York Times would do this when there were so many other legal options? It’s either because the people running it are just stupid asf motherfuckers, or because they want OpenAI to keep everyone’s chat logs and give them access so they (or someone else) can spy on everyone. And yeah, you might say: “But that’s just assumptions!” Well shut up you PLONKER, the entire fucking lawsuit filed by the New York Times is also based on assumptions!

If you have a New York Times subscription, unsubscribe to their shit now, they don’t deserve your money. Fuck NYT.

Edit: So i told my parents about the situation and they say it’s probably because New York Times just want to negotiate with OpenAI about copyright and they want to have the maximum leverage and they don’t actually want OpenAI to retain chat logs forever, but I think there’s a few billion better ways to do that than putting the rights of billions of people on the bargaining table. Even if what OpenAI did was wrong, what New York Times is doing to those billions of people is that tenfold, hundredfold, maybe even worse.

4 Likes

Yes, these things are absurd. Another thing absurd is that OpenAI is violating every single copyright law too.

Reasonable thing to do would be to close OpenAI down (with all data deleted) :light_bulb:

2 Likes

In most jurisdictions, AI can use copyrighted works under fair use, as long as it is transformative and adds things to it and doesn’t just straight up copy it. In places like the EU this also applies, but the copyright holder can object to the use of their copyrighted works, especially if the AI model is commercial. OpenAI isn’t directly violating the law by using copyrighted works (unless they refuse objections and DMCA takedowns) but this is still a very controversial topic and is up for debate, though I believe there are other solutions (eg. AI must credit the creator, no indiscriminate scraping of copyrighted works) and closing OpenAI would do more harm than good. But this is entirely my opinion and you should take this with a grain of salt.

1 Like

This is what AI companies want us to assume.

If we look would look through the lens of open source development, then it would be super clear, its copyright violation.

As per ‘open source AI’ definition Open Source AI – Open Source Initiative, such models almost don’t exist https://osai-index.eu/, because everyone hides their illegally sourced data.

There is an interesting solution https://openeurollm.eu/ - transparent AI funded by EU.

I wonder how they will grow such transparent models. They have to open the data, but follow copyright law at the same time.

I think they have to ask everyone’s consent. This is the proper way to grow AI, slowly and following the law.

Slowly and transparently, because there are so many other problems with this hasty AI development behind closed doors.

2 Likes

What harms closing OpenAI (and all other companies) exactly would cause?

I could send endless examples why its good to close those companies down, let’s say recent very good video from scientist Sabine Hossenfeld https://inv.nadeko.net/watch?v=1MoTeoKaneU (titled “AI’s Real Threat: Mass Manipulation”)

1 Like

That could be the fact, and has happened in multiple cases, such as how the Internet Archive is falsely claiming archival purposes and fair use to indiscriminately scrape data and offer copyrighted material for free and how Youtube falsely claimed DMCA safe harbour in the 2008 lawsuit against Viacom, and I do believe that AI training does not directly fall under fair use. However, AI training is, from a legal standpoint, transformative as explained by Judge William Alsup in the Bartz v. Anthropic lawsuit.

[U]sing copyrighted works to train LLMs to generate new text was quintessentially transformative. Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them—but to turn a hard corner and create something different. If this training process reasonably required making copies within the LLM or otherwise, those copies were engaged in a transformative use.

Cited from Two Courts Rule On Generative AI and Fair Use — One Gets It Right | Electronic Frontier Foundation

If AI companies do hide illegally sourced data, that would be tampering of evidence or obstruction of justice, which is a criminal offensive in multiple juristictions. As of now there is no definitive proof of that, and it could just be to protect their intellectual property, though I could see how it could be used to hide their more controversial actions.

This is also, in my opinion, the only real ethical method of training AI, and I completely agree with you, but it probably would not be as simple as that in practice, as its limited dataset and low level of recognition would make it hard to compete with other LLMs.

When I try to play the video it says “The media could not be loaded, either because the server or network failed or because the format is not supported.” It would be appreciated if you can summarize the contents, thank you. Finally, I would like to point out that my original post was about New York Times baselessly and illegally forcing indefinite data retention. I think you should make your own post about the legitimacy of AI companies such as OpenAI, as you have lots of convincing arguments and making your own posts can get your ideas seen and recognized by more people.

1 Like

OpenAI is doing so many wrong things and very fast. There are many absurd and nonsense things with OpenAI and other AI companies. Neither lawmakers, nor courts have time to react on this quickly moving tech, especially because its new technology with little direct precedence making things complicated, and reaction of government not proportional to the problems at hand.

Let’s take that famous PFAS example PFAS - Wikipedia how long it took for courts to decide on the harms? Or asbestos, lead, mercury etc. Proper laws for regulating fresh new advanced tech (& consequences of it) don’t exist yet.

I both agree for many reasons, and not. Maybe its important to put this NYT case into context of all other absurdity, so to choose most productive thing to study. And the most productive thing is maybe the question if they (AI companies) should be allowed to be operate at all.

People who study this topic call this competition “race to the bottom” Race to the bottom - Wikipedia (and related terms in this wiki article, section “see also”). Reasonable thing to do would be to ban this competition on AI, because its not good to anyone, but they still do it for money & power.

This link is to instance of very cool “Invidious” project Invidious Instances - Invidious Documentation (or it used to be very cool in the past). It is an alternative way to watch youtube etc. But youtube became very agressive blocking 3rd parties. A lot of alternatives died because of sophisticated blocking by youtube, but this site still survives with issues (but a lot of work by developers to keep it running). Youtube primarily is probably blocking not alternatives, but fellow AI companies from scrapping content (here is example of google’s current video ‘deepfake’ capabilities Reddit - The heart of the internet which are very impressive, and other companies want that access to videos).

If you would go back to that ‘Nadeko’ link, you can “Switch Backend” at the top and video will start playing. Or better here is youtube direct link https://youtube.com/watch?v=1MoTeoKaneU

That video is already a summary of so many things :smiley: So better is to watch that dense summary directly.

I have not much to argue against you this time, you are almost entirely correct, though I do believe you should make a post discussing this in depth instead, as replying to my post about a different topic is not very

I am not sure if I agree.

For example, does it matter too much if NYT case cause retain all logs, if chatgpt would probably retain 95% of logs anyway (and train AI on that data).

For example, a quote from What are great and simple ChatGPT alternatives to recommend people? - Techlore Forum :

Initially, some European countries banned chatgpt years ago, but then lifted the ban, after openai added an option to disable ‘chat history’. In other words, OpenAI did as little as possible to protect privacy, just enough to satisfy the demands of regulators by providing practically unusable option.

In other words, there is no privacy with OpenAI to begin with. Just another absurd situation (highly related) when laws don’t work in practice.

I am already aware of this, but I believe what ChatGPT said about chats being deleted after 30 days is true (if not, I have legal actions planned). As for using it to train their models, I don’t have any problems with that, as long as they respect my rights and minimize and anonymize the data. Also, I do not intend to use ChatGPT in the long term, due to multiple privacy and legal reasons, some of which you have mentioned above.

Last time I checked their policy, if you use their API (not chatgpt app), then they were deleting it in 30 days (But NanoGPT recently added a warning next to OpenaAI models, that even with this API it can be used for training, for some reason, maybe because of NYT case, or their policy changed too or smth).

Yes, if you used this app in the past, and now you cannot delete your conversations (and you thought that you somewhat can), it is concerning.

That is the reason why I am against the New York Times right now instead of OpenAI, they are the ones who are causing this and OpenAI is currently trying to defend the right to privacy of users like me (even though I highly believe that is for their own benefit). Though if OpenAI won the lawsuit against New York Times but they refuse to erase all data they hold about me when I am moving on from ChatGPT you can believe me I’m gonna put them on T̸̘͉̈́h̷̻̭͛ę̵̡̽ ̸̲̬̕͝L̴̹͛í̸̝̟̀s̷̩͊̈́t̷̟̔̾.

Or, they are the ones creating precedence that OpenAI and all other AI companies cannot profit from everyone’s content unaccountable (by extension, defending all content creators). In the end, they might have to share the profits they got from everyone’s data.

If I have not explained clearly enough, I am not siding with OpenAI because they are right, but because New York Times is clearly in the wrong as of now.

OpenAI is violating every single copyright law too.

The poor trillionares getting screwed over by another trillionare :sob: