QC-LY / UniBind
The source code for "UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All"
☆39Updated 10 months ago
Alternatives and similar repositories for UniBind:
Users that are interested in UniBind are comparing it to the libraries listed below
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆144Updated last month
- Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"☆94Updated 2 months ago
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆30Updated 4 months ago
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆37Updated 4 months ago
- [ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"☆231Updated last year
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆81Updated 5 months ago
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆70Updated 4 months ago
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆67Updated 4 months ago
- [CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities☆98Updated 11 months ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆45Updated 2 months ago
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models☆39Updated last month
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆80Updated 11 months ago
- [CVPR 2024] ViT-Lens: Towards Omni-modal Representations☆170Updated 2 weeks ago
- ☆103Updated last week
- Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)☆55Updated 10 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆105Updated last month
- Explore the Limits of Omni-modal Pretraining at Scale☆96Updated 5 months ago
- MLLM-Tool: A Multimodal Large Language Model For Tool Agent Learning☆109Updated 9 months ago
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding☆40Updated last month
- [ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generation☆53Updated 3 weeks ago
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention☆47Updated last month
- Visual self-questioning for large vision-language assistant.☆40Updated 4 months ago
- ☆27Updated 3 months ago
- Official implement of MIA-DPO☆49Updated last month
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆39Updated 4 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆32Updated 11 months ago
- MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer☆38Updated 5 months ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆26Updated 9 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆55Updated 8 months ago