Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

The advancements in large language models have significantly accelerated the development of natural language processing, or NLP. The introduction of the transformer framework proved to be a milestone, facilitating the development of a new wave of language models, including OPT …

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Recent advancements in Large Vision Language Models (LVLMs) have shown that scaling these frameworks significantly boosts performance across a variety of downstream tasks. LVLMs, including MiniGPT, LLaMA, and others, have achieved remarkable capabilities by incorporating visual projection layers and an …

文 » A