Beckschen / LLaVolta
[NeurIPS 2024] Efficient Multi-modal Models via Stage-wise Visual Context Compression
☆37Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for LLaVolta
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆40Updated last week
- Official implement of MIA-DPO☆32Updated this week
- 🔥ImageFolder: Autoregressive Image Generation with Folded Tokens☆53Updated 3 weeks ago
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆34Updated 3 weeks ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆37Updated last week
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 4 months ago
- 🔥stable, simple, state-of-the-art VQVAE toolkit & cookbook☆40Updated 4 months ago
- ☆29Updated last week
- Codebase for the paper-Elucidating the design space of language models for image generation☆28Updated last week
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆41Updated last week
- This is a repo to track the latest autoregressive visual generation papers.☆41Updated 3 weeks ago
- ☆35Updated last month
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆107Updated 4 months ago
- MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆50Updated last month
- Official repo for StableLLAVA☆90Updated 10 months ago
- Adaptive Caching for Faster Video Generation with Diffusion Transformers☆60Updated this week
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆52Updated last year
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆83Updated 2 weeks ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆115Updated last month
- [NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.☆40Updated 3 weeks ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆40Updated 3 months ago
- ☆20Updated 3 months ago
- Video dataset dedicated to portrait-mode video recognition.☆35Updated 7 months ago
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆84Updated last month
- Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image …☆51Updated 3 weeks ago
- ☆57Updated last year
- ☆55Updated 6 months ago
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆50Updated 5 months ago
- Official Repository of Personalized Visual Instruct Tuning☆23Updated this week