28 May 2024

‘That’ll cost you, ChatGPT’ — copyright needs an update for the age of AI

CHRISTOPHER KENNEALLY

In the internet age, copyright infringement lawsuits concentrate public attention on the speed at which technology races ahead of longstanding legal precedent.

The web crossed the artificial intelligence threshold in November 2022, when OpenAI launched ChatGPT, a free-to-the-public “generative AI” tool that answers questions in human language. The latest AI-related copyright infringement lawsuits allege that chatbots, robots, and other learning machines are getting their education from commercially produced works, which are used without permission or compensation.

Large Language Models like ChatGPT “learn” to create images and texts after “training” the systems on enormous amounts of existing works converted to data — the famous “ones and zeroes” of computer code. BBC has reported that training sources for ChatGPT amounted to 570 GB of data, or approximately 300 billion words.

Beyond commercially published books, journals, and newspapers, AI databases derive from a vast online trove of publicly available social media and Wikipedia entries, as well as digitized library and museum collections, court proceedings, and government legislation and regulation.

No comments: