QC-LY / UniBindLinks

The source code for "UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All"

☆48

Alternatives and similar repositories for UniBind

Users that are interested in UniBind are comparing it to the libraries listed below

Sorting:

z-x-yang / DoraemonGPT
Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models
☆87Updated last year
mrwu-mac / ControlMLLM
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆196Updated 4 months ago
ZhengYu518 / VL-Mamba
Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"
☆84Updated last year
hshjerry / VideoEspresso
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆127Updated 3 months ago
JiuTian-VL / MoME
[NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models
☆74Updated 6 months ago
ChocoWu / SeTok
Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM
☆75Updated 6 months ago
Hoar012 / RAP-MLLM
[CVPR 2025] RAP: Retrieval-Augmented Personalization
☆74Updated 3 months ago
Dongping-Chen / ISG
(ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.
☆31Updated 3 months ago
zifuwan / ONLY
[ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
☆41Updated 4 months ago
amazon-science / QA-ViT
☆69Updated last year
ziqipang / LM4VisualEncoding
[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"
☆244Updated last year
yuhui-zh15 / VLMClassifier
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
☆91Updated last year
yaolinli / DeCo
Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models
☆74Updated 4 months ago
zehanwang01 / FreeBind
☆22Updated 6 months ago
ncTimTang / AKS
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding
☆132Updated 2 months ago
isekai-portal / Link-Context-Learning
☆100Updated last year
ExplainableML / flair
[CVPR 2025] FLAIR: VLM with Fine-grained Language-informed Image Representations
☆113Updated 2 months ago
mbzuai-oryx / VideoGLaMM
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
☆90Updated 7 months ago
Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
☆29Updated last year
scofield7419 / Video-of-Thought
Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"
☆169Updated 8 months ago
hmxiong / StreamChat
Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025
☆85Updated 8 months ago
AILab-CVC / M2PT
[CVPR 2024] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
☆100Updated last year
Becomebright / ReKV
Official PyTorch Code of ReKV (ICLR'25)
☆66Updated 2 weeks ago
zhousheng97 / EgoTextVQA
[CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
☆41Updated 5 months ago
yu-rp / apiprompting
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
☆106Updated last year
Ruiyang-061X / Awesome-MLLM-Uncertainty
✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).
☆54Updated 7 months ago
double125 / MADTP
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
☆48Updated last year
Yxxxb / VoCo-LLaMA
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆194Updated 5 months ago
Shengcao-Cao / groundLMM
Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision
☆41Updated 3 weeks ago
zhangquanchen / VisRL
[ICCV 2025] VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
☆41Updated last week