Liuziyu77 / RAR
The official implementation of RAR
☆61Updated 5 months ago
Related projects: ⓘ
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆85Updated last week
- Simple PyTorch implementation of "Libra: Building Decoupled Vision System on Large Language Models" (accepted by ICML 2024)☆41Updated 3 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆42Updated 3 months ago
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input☆44Updated 3 weeks ago
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆62Updated 4 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆75Updated 5 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆103Updated 3 weeks ago
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆45Updated last month
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM☆47Updated 2 months ago
- ☆83Updated 9 months ago
- Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆44Updated 3 weeks ago
- This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"☆138Updated 5 months ago
- Official repository of MMDU dataset☆61Updated last month
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆80Updated 2 months ago
- Code for the paper: "SuS-X: Training-Free Name-Only Transfer of Vision-Language Models" [ICCV'23]☆91Updated last year
- Dense Connector for MLLMs☆98Updated last month
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆72Updated 3 months ago
- ☆58Updated this week
- A collection of visual instruction tuning datasets.☆74Updated 6 months ago
- Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆44Updated 3 months ago
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆49Updated this week
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆36Updated last year
- ☆100Updated last month
- LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft☆37Updated 2 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆67Updated 5 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆70Updated 2 weeks ago
- Task Residual for Tuning Vision-Language Models (CVPR 2023)☆65Updated last year
- The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》☆31Updated 6 months ago
- ☆37Updated 2 months ago
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆57Updated 2 months ago