•  
  •  
 

Cardozo Arts & Entertainment Law Journal

Abstract

In their high-profile suit in late 2023, The New York Times Company ("The Times") alleged that OpenAI's Generative Pre-trained Transformer (GPT) language models might output "near-verbatim" copies ("memorizations") of the works of The Times based on certain user prompts and thus might have infringed on The Times's exclusive rights over its content. One of the central issues is whether the embodiment of unauthorized reproductions of Times content in the GPT models constitutes a copyright infringement. This comment argues that OpenAI might have a colorable fair use affirmative defense despite the plaintiff's prima facie copyright infringement claim.

Existing literature on the infringement or fair use analysis in the machine learning context often lacked a deep dive into the traits of the relevant technology and their legal implications. This comment attempts to overcome this limitation by adopting a holistic approach that requires reviewing literature in both law and computer science.

On the one hand, in a recent article, Professor Michael Murray, attempting to correct several reductionist simplifications that view "Al" as a single "magic box," argued that the copyright infringement analysis of a text-to-image model's output might require insights into "the different roles of the training dataset designers, the generative Al system designers, and the end-users. . . ." Furthermore, a recent article by Professor James Grimmelmann (collaborating with two computer scientists) took a similar view.

On the other hand, the memorization phenomenon has drawn the attention of computer scientists (including OpenAI employees). Their scientific papers thoroughly explored the memorization phenomenon with innovative empirical approaches; but, by their very nature, lacked in-depth discussions on this phenomenon's copyright law implications.

Extrapolating Murray's deconstructive analytical framework from text-to-image models to text-generating language models and inspired by the scientific research on memorization, this comment explores the potential legal arguments around the key direct and contributory infringement issues resulting from the verbatim copying alleged by The Times. Part I.A of this comment concisely introduces the relevant technical concepts in a way accessible to legal scholars with little background in Natural Language Processing or machine learning; Part I.B serves as a succinct case brief. Part II.A reviews the relevant scientific literature that deemed memorization as a sine qua non during the stateof-the-art language model training process; Part II.A also heralds some legal implications of the scientific conclusions. Part II.B reviews Murray's and Grimmelmann's efforts to differentiate the different actors and stages through the generative Al supply chain, which shed light on the infringement analysis. Lastly, this comment concludes that: (1) the defendants may have a colorable fair use defense given the precedents around non-expressive copying (in Part II.C), and (2) regarding the plaintiff's contributory infringement claims, some reasonable mitigation measures may be feasible (in Part II.D).

Disciplines

Entertainment, Arts, and Sports Law | Intellectual Property Law | Law | Science and Technology Law

Share

COinS