Paved with True Intents:
Intent-Aware Training Improves LLM Safety Classification Across Training Regimes

Jeremias Ferrao1*, Niclas Müller-Hof1*, Iustin Sîrbu2, Traian Rebedea2,3, Yftah Ziser1,3
1 University of Groningen 2 University POLITEHNICA of Bucharest 3 NVIDIA
* Equal contribution.
Intent-aware training overview
We treat user intent as an explicit signal between the prompt and the harm label, and study how modeling it improves safety classifiers across training regimes.
Latency vs F1 Pareto frontier
Our intent-aware models form the inference latency–F1 Pareto frontier across five external safety benchmarks.

Abstract

We argue that safety classifiers should model user intent as an explicit signal between the prompt and the final label. To study this, we introduce AIMS, a human-annotated dataset of 1,724 difficult safety prompts, each paired with an intent description and harm label. We use AIMS to evaluate intent-aware training across supervised fine-tuning, preference learning, reasoning distillation, and reinforcement learning. Despite its size, AIMS enables competitive safety classifiers across training regimes: DPO from model-generated intent errors improves over SFT, and intent-conditioned distillation outperforms reasoning-only distillation in most teacher–student pairs. Most notably, directly rewarding intent faithfulness with GRPO yields the strongest average performance across five external safety benchmarks, while our intent-aware models form the inference latency–F1 Pareto frontier. These results show that faithful intent modeling is a compact, high-quality supervision signal for more robust safety classifiers.

BibTeX

@misc{ferrao2026paved,
  title         = {Paved with True Intents: Intent-Aware Training Improves LLM Safety Classification Across Training Regimes},
  author        = {Ferrao, Jeremias and M{\"u}ller-Hof, Niclas and S{\^i}rbu, Iustin and Rebedea, Traian and Ziser, Yftah},
  year          = {2026},
  eprint        = {XXXX.XXXXX},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CL}
}