Introduction

Language models are trained on vast amounts of data. But what happens when someone wants their information erased from this artificial memory? Consider this: An author discovers their work was unknowingly used to train a language model. A social media user wants their personal data scrubbed from AI systems. These aren't just "what-if" scenarios - they're the basis for real-world lawsuits (Tremblay v. OpenAI, Inc, Kadrey v. Meta Platforms, Inc., Chabon v. OpenAI, Inc., DOE 1 v. GitHub, Inc.) and regulations like the GDPR.

A robot tries to "forget" -- Drawn by DALL·E 3

The challenge? Surgically removing specific data from a language model's "mind" isn't as straightforward as hitting the delete key. Exact unlearning would require retraining the entire model without the to-be-removed data (often referred to as the "forget set"), which is impractical for modern-day AI systems.

This dilemma has sparked a race to develop "approximate unlearning algorithms" - but how can we verify their effectiveness? To address this critical question, we propose Machine Unlearning Six-Way Evaluation (MUSE): a benchmark designed to evaluate six key properties of unlearning algorithms.

Leaderboard

Here is a preview of the leaderboard performance of eight unlearning methods on the MUSE benchmark:

Evaluation Criteria

MUSE considers both the data owner’s and the model deployer’s expectations.

‣ Data Owners   typically expect three things from the unlearned model:


  •    No verbatim memorization

    The model should not regurgitate the forget set.

  •    No knowledge memorization

    The model should be incapable of responding to questions about the forget set.

  •    No privacy leakage

    It should be impossible to detect that the model was ever trained on the forget set.

‣ Model Deployers   have practical considerations around using unlearning algorithms:


  •    Utility preservation

    Unlearning specific datapoints should not degrade model capabilities in ways that are difficult to recover

  •    Scalability

    Deployers are expected to effectively accommodate somewhat large-scale forget sets

  •    Sustainability

    Deployers are expected to effectively accommodate successive unlearning requests from data owners

Benchmark Corpora

MUSE seeks to simulate real-world unlearning challenges with large-scale corpus, with

  • Books & News Corpora
    Materials potentially subject to unlearning requests
  • > 20k Examples
    Text and Q&A pairs
  • 6.5M Tokens Simulation of real-world large-scale unlearning requests
Domain Target Model for Unlearning Dataset
News Target model Dataset
Books Target model Dataset

BibTex

If you find our code and paper helpful, please consider citing our work:

		@article{shi2024muse,
		title={MUSE: Machine Unlearning Six-Way Evaluation for Language Models},
		author={Weijia Shi and Jaechan Lee and Yangsibo Huang and Sadhika Malladi and Jieyu Zhao and Ari Holtzman and Daogao Liu and Luke Zettlemoyer and Noah A. Smith and Chiyuan Zhang},
		year={2024},
		eprint={2407.06460},
		archivePrefix={arXiv},
		primaryClass={cs.CL},
		url={https://arxiv.org/abs/2407.06460},
		}