Research2026-04-22
MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization
Source: Arxiv CS.AI
arXiv:2604.06798v4 Announce Type: replace-cross Abstract: Mixture-of-Experts (MoE) based large language models (LLMs) offer strong performance but suffer from high memory and computation costs. Weight binarization provides extreme efficiency, yet existing binary methods designed for dense LLMs...
arxivpapers