BUSINESS

OpenAI may have violated YouTube terms of service, CEO says


It’s well known that OpenAI scrapes vast amounts of data, some of it copyrighted, from the internet to produce the uncannily human-like experience of ChatGPT. The legality of that is still a live question, as lawsuits from the New York Times and others attest. But how does it train its new video AI program, Sora?

If Sora used content from YouTube it would be a “clear violation” of its terms of service, YouTube CEO Neal Mohan told Bloomberg. 

Mohan was referring to long-standing questions about where AI companies get the content they use to train the model that power their services. While Mohan was sure to say he didn’t know whether OpenAI’s had used YouTube content to develop Sora, he said that would be a problem, if so. 

“From a creator’s perspective, when a creator uploads their hard work to our platform, they have certain expectations,” Mohan said. “One of those expectations is that the terms of service are going to be abided by.” 

Something like having their content scraped from the platform and used by a third party would be a “clear violation of our [terms of service],” Mohan said. 

Downloading videos or transcripts would be an infringement on terms. “Those are the rules of the road in terms of content on our platform,” Mohan said.

A spokesperson for YouTube confirmed its terms of service “prohibit unauthorized scraping or downloading of YouTube content,” without elaborating on Mohan’s comments. OpenAI did not immediately respond to a request for comment. 

OpenAI admitted that it had used copyrighted data to train its AI models, saying it was “impossible” to build the technology without it. The admission came from a filing OpenAI submitted to the British House of Lords when the U.K. government was considering a new law that would limit how AI companies could use copyrighted material. 

More recently, the launch of Sora drew further scrutiny when OpenAI CTO Mira Murati was unable to answer a question about what type of content was used to train the program, and specifically if any from YouTube was. “I’m actually not sure about that,” Murati told the Wall Street Journal

Murati then added that any data used was publicly available or licensed. Mohan hinted at this interview telling Bloomberg they should ask OpenAI if it had used YouTube data. “I guess they were asked,” Mohan seemed to remember midsentence, cutting himself off.  

Further complicating the matter is that YouTube and Google’s parent company, Alphabet, is developing its own suite of AI tools, making it likely that Alphabet is even more concerned a potential rival might be using its content in a way that violates its terms of service. 

“Google wants that data for its own models,” Igor Jablokov, founder and CEO of AI startup Pyron, told Fortune

The AI arms race has already kicked off a gold rush for data. Big AI players like Alphabet, Microsoft, Amazon, and Meta will want to make sure rivals don’t take the data they’ve accumulated. “They’ll all put up walled gardens as terms and conditions,” says Jablokov, whose previous voice-recognition startup was instrumental in Amazon’s subsequent creation of Alexa. 

For example, Reddit recently entered into a $60 million a year licensing agreement with Google that would see its content used to train the latter’s AI tools. Media companies have also struck similar deals with AI developers. The Associated Press has a deal with OpenAI that allows its archives to be used for training purposes. Meanwhile, German media company Axel Springer, which owns Business Insider and Politico, has a similar deal that also provides attribution in answers given by ChatGPT.

Subscribe to the Eye on AI newsletter to stay abreast of how AI is shaping the future of business. Sign up for free.

Source link

Related Articles

Please, use our online surveys for check your audience.
Back to top button
pinup