Managing AI Data in Pretrial Discovery

Several weeks ago, this blog reported on Judge Jed Rakoff’s widely discussed “AI is not your lawyer” pronouncement in United States v. Heppner. The court’s conclusion that attorney-client privilege was waived with respect to information that a client divulged to a consumer-grade generative artificial intelligence tool – when coupled with other recent court rulings explaining that AI-generated data can be discoverable electronically stored information – was something of a wake-up call for lawyers. And it’s sent many of them back to their clients with instructions on how to manage a looming source of trouble in civil litigation.

As Judge Rakoff noted in a wonderful understatement, “the implications for AI and the law are only beginning to be explored.”

The implications for AI and the law are only beginning to be explored.

AI Data Is Everywhere Today

Employees today routinely use generative AI tools – ChatGPT, Microsoft Copilot, Claude, Google Gemini, and others – to draft emails, summarize contracts, analyze data, prepare reports, and brainstorm litigation strategy. Every one of those interactions generates prompts, outputs, and activity logs that may contain relevant, discoverable information.

A good discussion of the discoverability of AI-related ESI is federal magistrate judge Ona Wang’s ruling in In re OpenAI Inc., Copyright Infringement Litigation, No. 25-MD-3143 (S.D.N.Y., Dec. 2, 2025), which compelled the production of millions of generative AI logs, including user prompts and model responses. The magistrate’s ruling applied traditional discovery principles under Federal Rule of Civil Procedure 26(b)(1) and confirmed that novel forms of ESI receive no special exemption simply because they are new.

AI-generated content presents preservation challenges that conventional electronically stored information does not. Artifacts of AI usage include at least three unique characteristics that traditional ESI does not.

First, AI data can be located in unexpected places. When an employee uses a personal ChatGPT account to draft a work email, that conversation history resides on a third-party platform outside the company’s information technology infrastructure.

Second, AI data can vanish. Many AI tools do not archive conversation histories beyond a limited window. Some outputs – temporary text suggestions, image drafts, code snippets – may disappear the moment they serve their purpose.

Third, AI data will in most cases reside with third-party vendors.

Litigators seeking to cover the waterfront of potentially discoverable AI data could efficiently start with the following familiar types of digital material spun off by artificial intelligence technologies:

  • Prompts submitted by employees or agents to any generative AI tool
  • Outputs generated by AI tools
  • Activity logs and metadata that record when AI tools were used and by whom
  • AI-assisted work product
  • Conversations with AI chatbots

That is quite a bit of data, all potentially discoverable. And doubtless there are other types of AI-related ESI that savvy litigators will uncover, with more emerging daily as artificial intelligence insinuates itself as a fixture in the modern workplace.

An AI Discovery Cheat sheet

What can be done? As with most things involving pretrial discovery, creative and proactive lawyering, along with judicial guidance, are already shaping how the legal community is handling discovery obligations relating to AI data. One good source of information is the work of the well-regarded Sedona Conference, a group of legal experts that recently produced a “Generative AI in Discovery Primer” addressing many common pretrial discovery issues arising from artificial intelligence. Just as a starting point, however, litigators can protect their clients’ interests by taking these steps now:

  • Revise litigation hold letter templates to expressly demand preservation of AI prompts, outputs, activity logs, and metadata associated with any generative AI tool used for business purposes.
  • Address AI data early in discovery planning. Raise the topic at the Rule 26(f) conference and incorporate AI-related terms into ESI protocols.
  • Determine which client employees used AI tools, which platforms they accessed, and whether they used personal or enterprise accounts.
  • Update internal hold procedures. Ensure that the law firm’s litigation hold notices capture AI-generated content created by attorneys, staff, and experts working on the case.
  • If relevant AI data resides with third-party AI providers, take steps early to preserve and collect it before automated deletion cycles destroy it.

Forewarned is forearmed, as the saying goes. Litigators who appreciate what sorts of AI data are out there, what risks that data poses to civil litigation outcomes, and how best to manage those risks, will be the ones who thrive in whatever technology future is coming just around the corner.