Policy2026-04-22
LEPO: Latent Reasoning Policy Optimization for Large Language Models
Source: Arxiv CS.AI
arXiv:2604.17892v2 Announce Type: replace-cross Abstract: Recently, latent reasoning has been introduced into large language models (LLMs) to leverage rich information within a continuous space. However, without stochastic sampling, these methods inevitably collapse to deterministic inference,...
arxivpapersreasoning