zehanwang01 / FreeBind
☆18Updated 7 months ago
Alternatives and similar repositories for FreeBind:
Users that are interested in FreeBind are comparing it to the libraries listed below
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆28Updated 3 months ago
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆18Updated last month
- MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆58Updated 4 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 5 months ago
- Official Repository of Personalized Visual Instruct Tuning☆26Updated 2 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆32Updated 7 months ago
- ☆27Updated 2 months ago
- ☆44Updated 8 months ago
- Official implement of MIA-DPO☆49Updated 2 months ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆20Updated 5 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆14Updated 3 months ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆18Updated 2 weeks ago
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆49Updated last year
- ☆37Updated 2 months ago
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆73Updated last month
- ☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆30Updated last week
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆33Updated 2 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback