Policy2026-06-18

DRIFT: Refining Instruction Data via On-Policy Data Attribution

arXiv:2606.18307v1 Announce Type: cross Abstract: Optimizing the training data distribution for Supervised Fine-Tuning (SFT) dictates the capability of Large Language Models (LLMs). While existing data curation methods excel at accelerating training under constrained budgets, they are less suited...

What Happened

A new paper titled DRIFT: Refining Instruction Data via On-Policy Data Attribution proposes a method for improving how large language models (LLMs) are fine-tuned on instruction data. The core insight is that not all training examples in a supervised fine-tuning (SFT) dataset contribute equally—or positively—to a model’s downstream performance. DRIFT introduces an on-policy data attribution technique that evaluates each training example based on its actual effect on the model’s behavior during fine-tuning, rather than relying on static heuristics or off-policy scoring. The method iteratively refines the training distribution by removing or down-weighting data points that cause negative or negligible impact, thereby optimizing the data mix for a given model and task.

Why It Matters

The quality of instruction data is arguably the single most important lever for improving LLM performance after pre-training. Yet most current curation approaches—such as filtering by perplexity, diversity sampling, or using external reward models—are either static or model-agnostic. They fail to account for the fact that the same data point can be beneficial for one model but harmful for another, or that its utility changes as the model learns. DRIFT addresses this by making data selection on-policy: it measures attribution using the model’s own gradients and loss landscape during training. This is a significant conceptual shift from off-policy methods, which treat data quality as an intrinsic property rather than a relational one.

For AI practitioners, the practical implication is more efficient and targeted fine-tuning. In an era where instruction datasets can contain hundreds of thousands of examples, many of which are noisy, redundant, or misaligned with the target use case, DRIFT offers a principled way to prune and reweight data. This can reduce training costs, improve model reliability, and reduce the risk of catastrophic forgetting or unintended biases being baked in during SFT. The method is particularly relevant for teams working with limited compute budgets or deploying specialized models in domains like medicine, law, or customer support, where data quality directly impacts safety and accuracy.

Implications for AI Practitioners

First, DRIFT suggests that data curation should be treated as an iterative, model-specific process rather than a one-time preprocessing step. Practitioners may need to rethink their SFT pipelines to incorporate attribution feedback loops. Second, the method highlights the growing importance of interpretability tools for training dynamics—understanding why a particular example helps or hurts is becoming as important as the example itself. Third, while DRIFT adds computational overhead (attribution requires additional forward/backward passes), the trade-off may be favorable for high-stakes applications where data quality is paramount. Finally, the approach aligns with broader trends in alignment and safety research: moving from static, rule-based data filtering to dynamic, model-aware data optimization.

Key Takeaways

DRIFT introduces an on-policy attribution method that evaluates each training example’s impact during SFT, enabling dynamic data refinement.
This approach outperforms static or off-policy curation by accounting for model-specific and training-stage-specific data utility.
For practitioners, DRIFT offers a path to more efficient fine-tuning with fewer, higher-quality examples, reducing cost and risk.
The method underscores a shift toward interpretable, iterative data optimization in LLM development, with direct implications for safety and alignment.

Read Original Article on Arxiv CS.AI

arxivpapers