rese1f / aurora
🔥 Aurora Series: A more efficient multimodal large language model series for video.
☆47Updated this week
Related projects ⓘ
Alternatives and complementary repositories for aurora
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆32Updated 5 months ago
- [NeurIPS 2024] Efficient Multi-modal Models via Stage-wise Visual Context Compression☆39Updated 3 months ago
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆36Updated last month
- ☆36Updated last month
- T2VScore: Towards A Better Metric for Text-to-Video Generation☆77Updated 7 months ago
- ☆20Updated 3 months ago
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆84Updated 2 months ago
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆43Updated 3 weeks ago
- Official implement of MIA-DPO☆39Updated 2 weeks ago
- TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation☆26Updated last week
- Implementation of Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding☆23Updated 2 weeks ago
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆49Updated 2 months ago
- ☆19Updated 11 months ago
- IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆25Updated last month
- ☆12Updated last month
- [NeurIPS2024] Official code for (IMA) Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs☆14Updated last month
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆89Updated last week
- Official Repository of Personalized Visual Instruct Tuning☆24Updated 2 weeks ago
- Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data☆32Updated 8 months ago
- FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax☆18Updated 11 months ago
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆50Updated 5 months ago
- ☆30Updated 3 weeks ago
- This is the official repo for the incoming work: ByteVideoLLM☆14Updated 2 weeks ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆55Updated 3 weeks ago
- [NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.☆40Updated last month
- Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆66Updated 5 months ago
- 👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)☆34Updated 2 weeks ago
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆23Updated 9 months ago
- Codebase for the paper-Elucidating the design space of language models for image generation☆31Updated this week