Gap-K% is a reference-free pretraining data detection method based on top-1 prediction gap with sliding-window smoothing
The opacity of massive pretraining corpora in Large Language Models (LLMs) raises significant privacy and copyright concerns, making pretraining data detection a critical challenge. Existing state-of-the-art methods typically rely on token likelihoods, yet they often overlook the divergence from the model's top-1 prediction and local correlation between adjacent tokens. In this work, we propose Gap-K%, a novel pretraining data detection method grounded in the optimization dynamics of LLM pretraining. By analyzing the next-token prediction objective, we observe that discrepancies between the model’s top-1 prediction and the target token induce strong gradient signals, which are explicitly penalized during training. Motivated by this, Gap-K% leverages the log probability gap between the top-1 predicted token and the target token, incorporating a sliding window strategy to capture local correlations and mitigate token-level fluctuations. Extensive experiments on the WikiMIA and MIMIR benchmarks demonstrate that Gap-K% achieves state-of-the-art performance, consistently outperforming prior baselines across various model sizes and input lengths.
For additional details and formal definitions, please refer to the paper
$$g_t = \frac{\log p(x_t \mid x_{<t}) - \max_{v \in \mathcal{V}} \log p(v \mid x_{<t})}{\sigma_t}$$
Comparison of token-level scores \(z_t\) (Min-K%++[1]) and \(g_t\) (Gap-K%) under two probability distributions. Min-K%++ subtracts the mean of the next-token distribution instead of the top-1 prediction.
The blue hatched bar denotes the target token \(x_t\), while the yellow hatched bar indicates the top-1 predicted token. In (a), the distribution is relatively flat, whereas in (b) it is sharply peaked at the top-1 token.
While Min-K%++ assigns the same score to both distributions, Gap-K% assigns different scores to the two distributions, viewing a larger gap between the top-1 prediction and the target token as stronger evidence of unseen data.
We evaluate Gap-K% on two benchmarks, WikiMIA and MIMIR. Across both benchmarks, Gap-K% consistently improves performance over baselines.
On WikiMIA, Gap-K% improves average AUROC by 9.7 percentage points over the average of prior baselines and by 2.4 percentage points over Min-K%++.
On MIMIR, Gap-K% achieves the highest average AUROC across Pythia models ranging from 1.4B to 12B, demonstrating strong performance on a challenging benchmark.