QC-LY / UniBindLinks
The source code for "UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All"
☆43Updated last year
Alternatives and similar repositories for UniBind
Users that are interested in UniBind are comparing it to the libraries listed below
Sorting:
- [CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities☆99Updated last year
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆69Updated 3 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆86Updated 11 months ago
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆241Updated last year
- [CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering☆36Updated last month
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆27Updated this week
- [CVPR 2025] RAP: Retrieval-Augmented Personalization☆64Updated 2 weeks ago
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆187Updated 3 weeks ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆82Updated last year
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆130Updated last year
- [CVPR 2024] ViT-Lens: Towards Omni-modal Representations☆178Updated 6 months ago
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆108Updated 2 weeks ago
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆45Updated 4 months ago
- Official implementation for the paper"Towards Understanding How Knowledge Evolves in Large Vision-Language Models"☆17Updated 4 months ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆180Updated last month
- [CVPR2025] FlashSloth: Lightning Multimodal Large Language Models via Embedded Visual Compression☆47Updated 5 months ago
- ☆69Updated last year
- Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"☆157Updated 5 months ago
- ☆32Updated 4 months ago
- ✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).☆52Updated 4 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆90Updated 3 months ago
- [ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models☆26Updated last month
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆88Updated 9 months ago
- ☆21Updated 3 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆65Updated 2 months ago
- Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM☆70Updated 3 months ago
- Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models☆63Updated 3 weeks ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆102Updated 4 months ago
- Official code for paper "GRIT: Teaching MLLMs to Think with Images"☆115Updated last week
- ✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆129Updated 5 months ago