JIA-Lab-research / MGMLinks
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
β3,332Updated last year
Alternatives and similar repositories for MGM
Users that are interested in MGM are comparing it to the libraries listed below
Sorting:
- γTMM 2025π₯γ Mixture-of-Experts for Large Vision-Language Modelsβ2,298Updated 6 months ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.β1,979Updated 2 months ago
- GPT4V-level open-source multi-modal model based on Llama3-8Bβ2,427Updated 10 months ago
- MiniSora: A community aims to explore the implementation path and future development direction of Sora.β1,279Updated 11 months ago
- γEMNLP 2024π₯γVideo-LLaVA: Learning United Visual Representation by Alignment Before Projectionβ3,445Updated last year
- InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactionsβ2,908Updated 8 months ago
- Mora: More like Sora for Generalist Video Generationβ1,582Updated last year
- A Next-Generation Training Engine Built for Ultra-Large MoE Modelsβ5,061Updated this week
- β4,523Updated 4 months ago
- Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.β2,081Updated last year
- Emu Series: Generative Multimodal Models from BAAIβ1,763Updated 2 weeks ago
- [TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.β1,910Updated 2 months ago
- official repository of aiXcoder-7B Code Large Language Modelβ2,275Updated 6 months ago
- Code and models for ICML 2024 paper, NExT-GPT: Any-to-Any Multimodal Large Language Modelβ3,616Updated 8 months ago
- Large World Model -- Modeling Text and Video with Millions Contextβ7,391Updated last year
- [ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)β1,843Updated 11 months ago
- Next-Token Prediction is All You Needβ2,281Updated 2 weeks ago
- Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion modelsβ3,151Updated last year
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clouβ¦β3,731Updated 2 months ago
- β1,840Updated last year
- [ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.β1,890Updated last year
- A family of lightweight multimodal models.β1,049Updated last year
- Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)β629Updated last year
- DeepSeek-VL: Towards Real-World Vision-Language Understandingβ4,054Updated last year
- TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbonesβ1,306Updated last year
- Reaching LLaMA2 Performance with 0.1M Dollarsβ986Updated last year
- VideoSys: An easy and efficient system for video generationβ2,016Updated 5 months ago
- π₯π₯ LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)β849Updated 5 months ago
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"β864Updated 8 months ago
- Lumina-T2X is a unified framework for Text to Any Modality Generationβ2,248Updated 11 months ago