dvlab-research / MGMLinks

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

☆3,328

Alternatives and similar repositories for MGM

Users that are interested in MGM are comparing it to the libraries listed below

Sorting:

PKU-YuanGroup / MoE-LLaVA
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
☆2,278Updated 4 months ago
zai-org / CogVLM2
GPT4V-level open-source multi-modal model based on Llama3-8B
☆2,427Updated 9 months ago
mini-sora / minisora
MiniSora: A community aims to explore the implementation path and future development direction of Sora.
☆1,271Updated 9 months ago
PKU-YuanGroup / Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
☆3,405Updated last year
cambrian-mllm / cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆1,974Updated 3 weeks ago
lichao-sun / Mora
Mora: More like Sora for Generalist Video Generation
☆1,582Updated last year
LargeWorldModel / LWM
Large World Model -- Modeling Text and Video with Millions Context
☆7,380Updated last year
InternLM / InternLM-XComposer
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
☆2,905Updated 6 months ago
ali-vilab / VGen
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
☆3,147Updated 10 months ago
aixcoder-plugin / aiXcoder-7B
official repository of aiXcoder-7B Code Large Language Model
☆2,276Updated 4 months ago
baaivision / Emu
Emu Series: Generative Multimodal Models from BAAI
☆1,761Updated last year
LLaVA-VL / LLaVA-NeXT
☆4,429Updated 2 months ago
NVlabs / VILA
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…
☆3,679Updated last week
YangLing0818 / RPG-DiffusionMaster
[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
☆1,836Updated 10 months ago
Alpha-VLLM / Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
☆2,236Updated 9 months ago
Vchitect / Latte
[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.
☆1,895Updated last month
ytongbai / LVM
☆1,838Updated last year
InternLM / xtuner
A Next-Generation Training Engine Built for Ultra-Large MoE Models
☆5,008Updated this week
BAAI-DCAI / Bunny
A family of lightweight multimodal models.
☆1,047Updated last year
baaivision / Emu3
Next-Token Prediction is All You Need
☆2,257Updated 2 weeks ago
facebookresearch / jepa
PyTorch code and models for V-JEPA self-supervised learning from video.
☆3,289Updated 9 months ago
myshell-ai / JetMoE
Reaching LLaMA2 Performance with 0.1M Dollars
☆988Updated last year
DLYuanGod / TinyGPT-V
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
☆1,304Updated last year
Ucas-HaoranWei / Vary-toy
Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)
☆625Updated 11 months ago
Meituan-AutoML / MobileVLM
Strong and Open Vision Language Assistant for Mobile Devices
☆1,303Updated last year
zai-org / CogVLM
a state-of-the-art-level open visual language model | 多模态预训练模型
☆6,705Updated last year
NUS-HPC-AI-Lab / VideoSys
VideoSys: An easy and efficient system for video generation
☆2,009Updated 3 months ago
Alpha-VLLM / LLaMA2-Accessory
An Open-source Toolkit for LLM Development
☆2,794Updated 10 months ago
facebookresearch / chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
☆2,068Updated last year
NExT-GPT / NExT-GPT
Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Model
☆3,596Updated 6 months ago