Meta’s questionable approach to data gathering to power its generative AI projects could result in significant penalties for the company, and might also set a new legal precedent for such use, after a federal judge ruled that a case brought against Meta by a group of authors will be allowed to proceed.

Back in 2023, a group of authors, including high-profile comedian Sarah Silverman, launched legal action against both Meta and OpenAI over the use of their copyrighted works to train their respective AI systems. The authors were able to show that these AI models were capable of reproducing their work in highly accurate form, which they claim demonstrates that both Meta and OpenAI used their legally protected material without consent. The lawsuit also alleges that both Meta and OpenAI removed the copyright information from their books to hide this infringement.

Meta has since sought to have the case thrown out on various legal grounds, while it’s also sought to keep Meta CEO Mark Zuckerberg from having to personally front the trial, in order to answer for his part in the process.

That stems from uncovered internal exchanges from Meta which appear to indicate that Zuckerberg himself approved the use of “likely pirated” material, as part of a broader push get his team to build better AI models to combat the rise of OpenAI.

It seems now that more information about Meta’s processes in this case will be revealed, with the trial set to proceed, as approved by a federal judge.

As reported by TechCrunch:

“In Friday’s ruling, [Judge] Chhabria wrote that the allegation of copyright infringement is “obviously a concrete injury sufficient for standing” and that the authors have also “adequately alleged that Meta intentionally removed CMI [copyright management information] to conceal copyright infringement.”

As such, the case will be allowed to progress, with Zuckerberg in attendance, though the judge did dismiss the authors’ claims relating to violations of the California fraud act, as he found no precedent for this element.

The case could end up being an embarrassing forced disclosure for the company, with Meta essentially required to answer for how it accessed key elements of the data set that powers its Llama AI models. That could also lead to further lawsuits for violation of copyright, and could end up costing the company billions in penalties as a result.

At the same time, the case may also set a new precedent for AI-related copyright penalties moving forward, by establishing a clear link between illegal data access as it relates to online repositories.

Though that element is likely already fairly solid in legal terms, with Meta apparently knowingly violating existing copyright law by approving the use of pirated material.

In any event, it could end up being a defining legal case in the broader AI shift, pitting high-profile artists against the tech giant.

The case is set to proceed shortly.