MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Recent advancements in Large Vision Language Models (LVLMs) have shown that scaling these frameworks significantly boosts performance across a variety of downstream tasks. LVLMs, including MiniGPT, LLaMA, and others, have achieved remarkable capabilities by incorporating visual projection layers and an …

文 » A