Research2026-04-20

Towards Intrinsic Interpretability of Large Language Models:A Survey of Design Principles and Architectures

arXiv:2604.16042v1 Announce Type: cross Abstract: While Large Language Models (LLMs) have achieved strong performance across many NLP tasks, their opaque internal mechanisms hinder trustworthiness and safe deployment. Existing surveys in explainable AI largely focus on post-hoc explanation methods...

Read Original Article on Arxiv CS.AI

arxivpapers