Research2026-04-22

MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization

arXiv:2604.06798v4 Announce Type: replace-cross Abstract: Mixture-of-Experts (MoE) based large language models (LLMs) offer strong performance but suffer from high memory and computation costs. Weight binarization provides extreme efficiency, yet existing binary methods designed for dense LLMs...

Read Original Article on Arxiv CS.AI

arxivpapers