A Cautionary Tale: Thomson Reuters Copyright Infringement Trial and the Future of Data Use in AI Model Training

In September, a US Circuit Judge in Delaware laid the groundwork for what could be the first trial centered on the authorized or unauthorized use of data to train AI systems. The legal battle is between Thomson Reuters (the company that owns Westlaw and Casetext) and ROSS Intelligence, a small AI-powered legal research platform that no longer exists due in part to this lawsuit. Let’s dig into the details.

Listen to the blog here!

Context

In May 2020, Thomson Reuters sued ROSS Intelligence for copyright infringement in the US District Court for the District of Delaware. In the complaint, Reuters alleged that ROSS stole “critical features” of the Westlaw legal research platform to develop its own research platform. Reuters accused ROSS of using a Westlaw licensee to access and copy valuable content not for legal research but to train its own AI model and create a competing product.

In December 2020, ROSS announced it was shutting down operations because of the lawsuit. ROSS Intelligence said, “With our company ensnared by this legal battle, we have been unable to raise another round of funding to fuel our development and marketing efforts.” ROSS committed to the fight against Reuters and called the lawsuit “an attempt to use litigation to stifle a competitor.” The alleged tactic worked because ROSS announced as of January 31, 2020, that their platform would no longer be available.

In September 2023, US Circuit Judge Stephanos Bibos decided that a jury must determine the outcome of this lawsuit and whether ROSS unlawfully copied content from Westlaw to train its AI-based platform. In the decision, the judge contemplated motions for summary judgment filed by both parties. According to the decision, ROSS Intelligence used a company called LegalEase Solutions to draft memos into usable machine-learning training data –data it obtained from Westlaw. Reuters argued that the memo questions were essentially its company’s headnotes with question marks at the end and were therefore copyright-protected. ROSS admitted the Westlaw headnotes influenced its memo questions but insisted lawyers drafted the memos and did not copy them directly from Westlaw.

The judge addressed five summary-judgment motions, finding that the nature of the copyrighted work “favors fair use, but factual questions remain” and that the judge “cannot yet determine the effect of the use upon the market for the work.” Ultimately, the judge concluded that though Reuters alleged ROSS copied protected aspects of Westlaw through LegalEase Solutions, and ROSS disputes “almost all of Thomson Reuter’s story,” it is not his role at the summary judgment stage to “sort through the evidence and tidy these factual messes.” The judge concluded it is a jury’s role at trial to determine and sort out the disputed facts. As such, he denied both parties’ motions for summary judgment.

The case between Reuters and ROSS goes to trial next year, and the stakes could not be higher for the AI community and the future of AI model development.

What’s at Stake?

Using any AI tool is only as good as the data it is trained on. Machine learning is the process of feeding data into computer algorithms so that the answers the model spits out become more refined and sophisticated over time. The more accurate the data set, the more precise the search results will be.

There’s no question that AI development has become a transformative and lucrative industry. With technological advancement comes necessary changes to laws and regulations. We are at an inflection point in history regarding AI development. Copyright laws need to be adjusted to encompass AI technology and machine learning. Currently, AI models are developed by being trained on vast amounts of data scraped from the internet. The programs ingest the data and, from that data, give answers to prompted questions.

Many companies do not pay for the data their machines learn from, but a series of lawsuits have been filed in the last couple of years over this exact issue, especially in the artistic industry. Check out this blog post for a deeper dive into copyright infringement and this post for a current AI-related lawsuit.

The Reuters v. ROSS litigation concerns the unauthorized use of a company’s data for AI machine learning. The debate centers on fair use versus copyright infringement. While judicial opinions and legislative materials are not protected under copyright, we are now in an era where AI companies want to protect their written works from being fed to machines to train them, and creators want to be compensated if their works are used to train AI models. “Copyright challenges pose a near-existential threat to existing AI models if the way they’re being trained isn’t aboveboard. If they can’t ingest mountains of data—which until now they’ve largely done without paying for that data—they won’t work.”

The outcome of the Reuters lawsuit will lay the groundwork for new precedents governing how AI engines are trained and whether tech companies must provide compensation if their chatbot’s content output reproduces part or all of a copyrighted work. Further, what constitutes copyrighted material as opposed to fair use will be redefined as AI technology advances.

What Makes Trellis Different?

Navigating AI technology can be tricky as there are many AI platforms to choose from. If you are a legal professional thinking about incorporating AI into your practice to save time, consider Trellis Law.

Trellis is an AI-driven, state-civil trial court research and analytics platform that has democratized access to state laws and regulations by making the fragmented United States state trial court system searchable through a single interface. Trellis takes thousands of county courts and millions of documents and structures them into an immense, searchable, and organized dataset.

Users get a broader perspective and more comprehensive insight by aggregating data from various county courts. Structured data allows lawyers and legal professionals to search for specific cases, precedents, or motions, allowing them to expedite their research process. Because our data is pulled directly from court dockets, our users can access actual court documents, trusting that the information provided is accurate. Contact us today for a demo.

Sources:

https://www.reuters.com/legal/thomson-reuters-ai-copyright-dispute-must-go-trial-judge-says-2023-09-26/

https://fingfx.thomsonreuters.com/gfx/legaldocs/zgvoraqlnpd/THOMSON%20REUTERS%20ROSS%20LAWSUIT%20sjruling.pdf

https://fingfx.thomsonreuters.com/gfx/legaldocs/zgvoraqlnpd/THOMSON%20REUTERS%20ROSS%20LAWSUIT%20sjruling.pdf

https://news.bloomberglaw.com/ip-law/thomson-reuters-will-head-to-trial-in-ai-model-copyright-battle

https://slate.com/technology/2023/10/artificial-intelligence-copyright-thomson-reuters-ross-intelligence-westlaw-lawsuit.html https://money.usnews.com/investing/news/articles/2023-09-25/thomson-reuters-ai-copyright-dispute-must-go-to-trial-judge-says