lizhaoliu-Lec / CG-VLM
This is the official repo for Contrastive Vision-Language Alignment Makes Efficient Instruction Learner.
☆20Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for CG-VLM
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆36Updated last year
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Updated last year
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆41Updated last year
- [ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning☆44Updated 6 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆56Updated last year
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆24Updated last month
- ☆85Updated 11 months ago
- The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…☆18Updated last year
- Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)☆54Updated 7 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆49Updated 5 months ago
- Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆51Updated 3 months ago
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆82Updated 5 months ago
- The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity". Th…☆33Updated 2 weeks ago
- ☆27Updated 8 months ago
- ☆30Updated this week
- [CVPR' 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding☆42Updated 3 months ago
- ☆21Updated 3 months ago
- ☆24Updated 4 months ago
- [ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'☆20Updated last month
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆22Updated 2 weeks ago
- ☆57Updated last year
- Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization☆66Updated 9 months ago
- Code for paper: VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning☆29Updated 7 months ago
- A collection of visual instruction tuning datasets.☆75Updated 8 months ago
- ☆17Updated 9 months ago
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆41Updated 4 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆90Updated 4 months ago
- VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation☆84Updated 2 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆78Updated 8 months ago
- SVIT: Scaling up Visual Instruction Tuning☆163Updated 5 months ago