deepglint / Croc
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
☆13Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for Croc
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆48Updated last month
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions☆151Updated 4 months ago
- Making LLaVA Tiny via MoE-Knowledge Distillation☆55Updated 2 weeks ago
- ☆33Updated 4 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆116Updated last month
- 🌋👵🏻 Yo'LLaVA: Your Personalized Language and Vision Assistant☆66Updated 2 weeks ago
- The official code of the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate".☆78Updated 2 weeks ago
- [NeurIPS 2023 Datasets and Benchmarks] "FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation", Yuanxin L…☆48Updated 8 months ago
- Video dataset dedicated to portrait-mode video recognition.☆35Updated 7 months ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆137Updated 3 weeks ago
- Official repository of MMDU dataset☆74Updated last month
- The official implementation of RAR☆72Updated 7 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆45Updated 2 months ago
- ☆42Updated last month
- LMM which strictly superset LLM embedded☆31Updated last week
- A collection of visual instruction tuning datasets.☆76Updated 7 months ago
- [ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs☆64Updated last week
- ☆84Updated 11 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆106Updated 2 weeks ago
- ☆30Updated last month
- Official implement of MIA-DPO☆32Updated last week
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆52Updated 2 months ago
- ☆131Updated 10 months ago
- The official code of the paper "PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction".☆42Updated last week
- LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft☆39Updated 3 months ago
- HallE-Control: Controlling Object Hallucination in LMMs☆28Updated 7 months ago
- T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation☆46Updated 2 months ago
- Unified Multi-modal IAA Baseline and Benchmark☆70Updated last month
- Official code for CVPR 2024 paper: Discriminative Probing and Tuning for Text-to-Image Generation☆25Updated 2 months ago
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆22Updated 4 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆48Updated 5 months ago