zehanwang01 / OmniBind
☆27Updated 2 months ago
Alternatives and similar repositories for OmniBind:
Users that are interested in OmniBind are comparing it to the libraries listed below
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆18Updated last month
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆32Updated last month
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding☆18Updated 2 weeks ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆91Updated 6 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 5 months ago
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆32Updated this week
- VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆45Updated this week
- ☆26Updated 5 months ago
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆28Updated 3 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆56Updated 4 months ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆20Updated 5 months ago
- Official implement of MIA-DPO☆49Updated 2 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 7 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆54Updated 7 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆56Updated last year
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆18Updated 2 weeks ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆32Updated 7 months ago
- The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity". Th…☆42Updated 2 months ago
- ☆87Updated last year
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆42Updated last month
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆25Updated 6 months ago
- [NeurIPS 2024] Efficient Multi-modal Models via Stage-wise Visual Context Compression☆50Updated 5 months ago
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆60Updated 4 months ago
- 🔥 Aurora Series: A more efficient multimodal large language model series for video.☆62Updated 2 months ago
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆63Updated 2 months ago
- Explore the Limits of Omni-modal Pretraining at Scale☆96Updated 4 months ago
- 👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)☆50Updated this week
- Official repo for StableLLAVA☆94Updated last year
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆19Updated 3 months ago
- [EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality☆14Updated 3 months ago