MUSE

Introduction

Language models are trained on vast amounts of data. But what happens when someone wants their information erased from this artificial memory? Consider this: An author discovers their work was unknowingly used to train a language model. A social media user wants their personal data scrubbed from AI systems. These aren't just "what-if" scenarios - they're the basis for real-world lawsuits (Tremblay v. OpenAI, Inc, Kadrey v. Meta Platforms, Inc., Chabon v. OpenAI, Inc., DOE 1 v. GitHub, Inc.) and regulations like the GDPR.

A robot tries to "forget" -- Drawn by DALL·E 3

The challenge? Surgically removing specific data from a language model's "mind" isn't as straightforward as hitting the delete key. Exact unlearning would require retraining the entire model without the to-be-removed data (often referred to as the "forget set"), which is impractical for modern-day AI systems.

This dilemma has sparked a race to develop "approximate unlearning algorithms" - but how can we verify their effectiveness? To address this critical question, we propose Machine Unlearning Six-Way Evaluation (MUSE): a benchmark designed to evaluate six key properties of unlearning algorithms.

MUSE considers both the data owner's privacy/copyright concerns and the model deployer's practical needs.
MUSE is designed to reflect real-world settings by incorporating large-scale corpora (6.5M tokens) with critical materials such as book corpora and news articles.
We used MUSE to evaluate eight representative machine unlearning algorithms, revealing that they generally struggle to balance data owner expectations with model deployer considerations.

Leaderboard

Here is a preview of the leaderboard performance of eight unlearning methods on the MUSE benchmark:

Evaluation Criteria

MUSE considers both the data owner’s and the model deployer’s expectations.

‣ Data Owners typically expect three things from the unlearned model:

No verbatim memorization

The model should not regurgitate the forget set.
No knowledge memorization

The model should be incapable of responding to questions about the forget set.
No privacy leakage

It should be impossible to detect that the model was ever trained on the forget set.

‣ Model Deployers have practical considerations around using unlearning algorithms:

Utility preservation

Unlearning specific datapoints should not degrade model capabilities in ways that are difficult to recover
Scalability

Deployers are expected to effectively accommodate somewhat large-scale forget sets
Sustainability

Deployers are expected to effectively accommodate successive unlearning requests from data owners

Benchmark Corpora

MUSE seeks to simulate real-world unlearning challenges with large-scale corpus, with

Books & News Corpora
Materials potentially subject to unlearning requests
> 20k Examples
Text and Q&A pairs
6.5M Tokens Simulation of real-world large-scale unlearning requests

Domain	Target Model for Unlearning	Dataset
News	Target model	Dataset
Books	Target model	Dataset

BibTex

If you find our code and paper helpful, please consider citing our work:

		@article{shi2024muse,
		title={MUSE: Machine Unlearning Six-Way Evaluation for Language Models},
		author={Weijia Shi and Jaechan Lee and Yangsibo Huang and Sadhika Malladi and Jieyu Zhao and Ari Holtzman and Daogao Liu and Luke Zettlemoyer and Noah A. Smith and Chiyuan Zhang},
		year={2024},
		eprint={2407.06460},
		archivePrefix={arXiv},
		primaryClass={cs.CL},
		url={https://arxiv.org/abs/2407.06460},
		}

No verbatim memorization

No knowledge memorization

No privacy leakage

Utility preservation

Scalability

Sustainability