Meta Allegedly Trained Its AI on Pirated Torrent Content

0
34
Meta Allegedly Trained Its AI on Pirated Torrent Content

A fresh controversy is brewing around artificial intelligence, with Meta facing allegations of using pirated materials from torrent sites to develop its large language model (LLM) known as Llama, which powers Meta AI. This incident marks one of the initial copyright lawsuits against a tech firm for AI training purposes.

Documents Uncover Meta AI’s Training on Pirated Materials

As outlined by Wired, Meta faced a lawsuit in 2023 for purportedly training Llama with unauthorized content. This case, dubbed “Kadrey et al. v. Meta Platforms,” was initiated by authors Richard Kadrey and Christopher Golden, who accused Meta of utilizing copyrighted materials without consent.

Previously, Meta provided the court with documents that contained redactions, but Judge Vince Chhabria of the United States District Court for the Northern District of California mandated the release of the unredacted versions, which has now occurred.

The released documents exhibit discussions among Meta staff regarding Meta AI and Llama. In one notable exchange, an engineer expresses discomfort about “torrenting from a [Meta-owned] corporate laptop,” which supports claims that the company employed pirated resources for AI training. Another dialogue hints that “MZ” (Mark Zuckerberg) sanctioned the use of pirated content.

Evidence indicates that Meta accessed materials from LibGen, a large repository of pirated books, magazines, and academic publications. Established in Russia in 2008, LibGen has faced numerous copyright litigations, although the individuals behind the platform remain anonymous. Additionally, Meta is reported to have utilized materials from other “shadow libraries” for AI model training.

Meta defends its actions by asserting that it employed publicly available materials under the “fair use” legal doctrine, which permits the utilization of copyrighted content without authorization under specific circumstances evaluated on a case-by-case basis. The company also argues that it is merely “using text to statistically model language and generate original expression.”

What About Apple Intelligence?

Most iPhone owners see little to no value in Apple Intelligence so far | AI icons seen on Mac, iPad, and iPhone
Meta Allegedly Trained Its AI on Pirated Torrent Content 3

This isn’t the first instance of major tech companies being accused of training AI models with copyrighted material. Last year, investigations uncovered that the OpenELM model developed by Apple incorporated subtitles from over 170,000 YouTube clips.

This revelation initially prompted concerns that Apple was leveraging copyrighted content for Apple Intelligence. However, the company clarified that OpenELM is an open-source model intended for research, and its dataset does not contribute to the operation of Apple Intelligence.

Apple maintains that its AI capabilities present in iOS and macOS are developed using “licensed data, including selected data for specific features, as well as publicly available information gathered via our web crawler.”

Additionally, it’s notable that many prominent publishers, including The New York Times and The Atlantic, have opted not to permit their content for training Apple Intelligence.

FTC: We use income earning auto affiliate links. More.

upgraded banner