Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data

Yonsei University

Abstract

The opacity of massive pretraining corpora in Large Language Models (LLMs) raises significant privacy and copyright concerns, making pretraining data detection a critical challenge. Existing state-of-the-art methods typically rely on token likelihoods, yet they often overlook the divergence from the model's top-1 prediction and local correlation between adjacent tokens. In this work, we propose Gap-K%, a novel pretraining data detection method grounded in the optimization dynamics of LLM pretraining. By analyzing the next-token prediction objective, we observe that discrepancies between the model’s top-1 prediction and the target token induce strong gradient signals, which are explicitly penalized during training. Motivated by this, Gap-K% leverages the log probability gap between the top-1 predicted token and the target token, incorporating a sliding window strategy to capture local correlations and mitigate token-level fluctuations. Extensive experiments on the WikiMIA and MIMIR benchmarks demonstrate that Gap-K% achieves state-of-the-art performance, consistently outperforming prior baselines across various model sizes and input lengths.

Motivation and Insight

When the model’s top-1 prediction differs from the target token, the top-1 prediction induces the strongest gradient signal during training.
We therefore hypothesize that the log-probability gap between the top-1 prediction and the target token provides an informative signal for distinguishing training data from unseen data.

For additional details and formal definitions, please refer to the paper

Method

Measure top-1 prediction gap: At each token position, Gap-K% measures how much the target token deviates from the model’s top-1 prediction.

$$g_t = \frac{\log p(x_t \mid x_{<t}) - \max_{v \in \mathcal{V}} \log p(v \mid x_{<t})}{\sigma_t}$$

$x_t$: target token at position $t$
$v$: a token in the vocabulary $\mathcal{V}$
$\sigma_t$: standard deviation of the next-token log-probabilities

Apply sequential smoothing: Instead of treating tokens independently, Gap-K% averages gap scores over a sliding window to capture local correlations and reduce token-level fluctuations.
Average the lowest K% regions: Finally, Gap-K% averages the lowest K% of window-level scores, focusing on the most informative regions where the model deviates most from its top-1 predictions.

Why Does It Work?

Conceptual comparison of token-level scores used in Min-K%++ and Gap-K%

Comparison of token-level scores $z_t$ (Min-K%++^[1]) and $g_t$ (Gap-K%) under two probability distributions. Min-K%++ subtracts the mean of the next-token distribution instead of the top-1 prediction.

The blue hatched bar denotes the target token $x_t$, while the yellow hatched bar indicates the top-1 predicted token. In (a), the distribution is relatively flat, whereas in (b) it is sharply peaked at the top-1 token.

While Min-K%++ assigns the same score to both distributions, Gap-K% assigns different scores to the two distributions, viewing a larger gap between the top-1 prediction and the target token as stronger evidence of unseen data.

[1] Zhang et al. (2025), Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models, ICLR

Results

We evaluate Gap-K% on two benchmarks, WikiMIA and MIMIR. Across both benchmarks, Gap-K% consistently improves performance over baselines.

WikiMIA

On WikiMIA, Gap-K% improves average AUROC by 9.7 percentage points over the average of prior baselines and by 2.4 percentage points over Min-K%++.

MIMIR

On MIMIR, Gap-K% achieves the highest average AUROC across Pythia models ranging from 1.4B to 12B, demonstrating strong performance on a challenging benchmark.