OpenAI says it’s “impossible” to create useful AI models without copyrighted material

Enlarge (credit: OpenAI)

ChatGPT developer OpenAI recently acknowledged the necessity of using copyrighted material in the development of AI tools like ChatGPT, The Telegraph reports, saying they would be "impossible" without it. The statement came as part of a submission to the UK's House of Lords communications and digital select committee inquiry into large language models.

AI models like ChatGPT and the image generator DALL-E gain their abilities from training sessions fed, in part, by large quantities of content scraped from the public Internet without the permission of rights holders (In the case of OpenAI, some of the training content is licensed, however). This sort of free-for-all scraping is part of a longstanding tradition in academic machine learning research, but because deep learning AI models went commercial recently, the practice has come under intense scrutiny.

"Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials," wrote OpenAI in the House of Lords submission.

Read 6 remaining paragraphs | Comments