BeClaude
Research2026-04-20

Towards Intrinsic Interpretability of Large Language Models:A Survey of Design Principles and Architectures

Source: Arxiv CS.AI

arXiv:2604.16042v1 Announce Type: cross Abstract: While Large Language Models (LLMs) have achieved strong performance across many NLP tasks, their opaque internal mechanisms hinder trustworthiness and safe deployment. Existing surveys in explainable AI largely focus on post-hoc explanation methods...

arxivpapers