BeClaude
Policy2026-04-22

LEPO: Latent Reasoning Policy Optimization for Large Language Models

Source: Arxiv CS.AI

arXiv:2604.17892v2 Announce Type: replace-cross Abstract: Recently, latent reasoning has been introduced into large language models (LLMs) to leverage rich information within a continuous space. However, without stochastic sampling, these methods inevitably collapse to deterministic inference,...

arxivpapersreasoning