Yxxxb / VoCo-LLaMA
VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆95Updated 7 months ago
Alternatives and similar repositories for VoCo-LLaMA:
Users that are interested in VoCo-LLaMA are comparing it to the libraries listed below
- [NeurIPS 2024] Efficient Large Multi-modal Models via Visual Context Compression☆51Updated this week
- VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆49Updated last month
- Official implement of MIA-DPO☆49Updated 3 weeks ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆98Updated last week
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆19Updated 2 months ago
- Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆128Updated last month
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆115Updated 7 months ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆85Updated 6 months ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆156Updated 4 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆37Updated 3 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆127Updated 3 months ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆115Updated 9 months ago
- ☆110Updated 6 months ago
- [NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding☆62Updated last month
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆73Updated 3 weeks ago
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆51Updated last month
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆59Updated 5 months ago
- ☆132Updated 4 months ago
- [NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆165Updated 4 months ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆65Updated 2 weeks ago
- A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo☆32Updated 6 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Updated last year
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆141Updated 3 weeks ago
- ☆137Updated 3 months ago
- Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"☆87Updated 2 months ago
- Official repo for StableLLAVA☆94Updated last year
- ☆34Updated last month
- Official implementation of the Law of Vision Representation in MLLMs☆149Updated 2 months ago
- ☆58Updated last month