Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data

Minseo Kwak, Jaehyung Kim
Yonsei University

Gap-K% is a reference-free pretraining data detection method based on top-1 prediction gap with sliding-window smoothing

Abstract

The opacity of massive pretraining corpora in Large Language Models (LLMs) raises significant privacy and copyright concerns, making pretraining data detection a critical challenge. Existing state-of-the-art methods typically rely on token likelihoods, yet they often overlook the divergence from the model's top-1 prediction and local correlation between adjacent tokens. In this work, we propose Gap-K%, a novel pretraining data detection method grounded in the optimization dynamics of LLM pretraining. By analyzing the next-token prediction objective, we observe that discrepancies between the model’s top-1 prediction and the target token induce strong gradient signals, which are explicitly penalized during training. Motivated by this, Gap-K% leverages the log probability gap between the top-1 predicted token and the target token, incorporating a sliding window strategy to capture local correlations and mitigate token-level fluctuations. Extensive experiments on the WikiMIA and MIMIR benchmarks demonstrate that Gap-K% achieves state-of-the-art performance, consistently outperforming prior baselines across various model sizes and input lengths.

Motivation and Insight

  • When the model’s top-1 prediction differs from the target token, the top-1 prediction induces the strongest gradient signal during training.
  • We therefore hypothesize that the log-probability gap between the top-1 prediction and the target token provides an informative signal for distinguishing training data from unseen data.

For additional details and formal definitions, please refer to the paper

Method

  1. Measure top-1 prediction gap: At each token position, Gap-K% measures how much the target token deviates from the model’s top-1 prediction.
  2. $$g_t = \frac{\log p(x_t \mid x_{<t}) - \max_{v \in \mathcal{V}} \log p(v \mid x_{<t})}{\sigma_t}$$

    • \(x_t\): target token at position \(t\)
    • \(v\): a token in the vocabulary \(\mathcal{V}\)
    • \(\sigma_t\): standard deviation of the next-token log-probabilities
  3. Apply sequential smoothing: Instead of treating tokens independently, Gap-K% averages gap scores over a sliding window to capture local correlations and reduce token-level fluctuations.
  4. Average the lowest K% regions: Finally, Gap-K% averages the lowest K% of window-level scores, focusing on the most informative regions where the model deviates most from its top-1 predictions.

Why Does It Work?

Conceptual comparison of token-level scores used in Min-K%++ and Gap-K%

Comparison of token-level scores \(z_t\) (Min-K%++[1]) and \(g_t\) (Gap-K%) under two probability distributions. Min-K%++ subtracts the mean of the next-token distribution instead of the top-1 prediction.

The blue hatched bar denotes the target token \(x_t\), while the yellow hatched bar indicates the top-1 predicted token. In (a), the distribution is relatively flat, whereas in (b) it is sharply peaked at the top-1 token.

While Min-K%++ assigns the same score to both distributions, Gap-K% assigns different scores to the two distributions, viewing a larger gap between the top-1 prediction and the target token as stronger evidence of unseen data.

[1] Zhang et al. (2025), Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models, ICLR

Results

We evaluate Gap-K% on two benchmarks, WikiMIA and MIMIR. Across both benchmarks, Gap-K% consistently improves performance over baselines.

WikiMIA

On WikiMIA, Gap-K% improves average AUROC by 9.7 percentage points over the average of prior baselines and by 2.4 percentage points over Min-K%++.

WikiMIA results table

MIMIR

On MIMIR, Gap-K% achieves the highest average AUROC across Pythia models ranging from 1.4B to 12B, demonstrating strong performance on a challenging benchmark.

MIMIR results table