xinke-wang / ModaVerseLinks
[CVPR2024] ModaVerse: Efficiently Transforming Modalities with LLMs
☆29Updated 11 months ago
Alternatives and similar repositories for ModaVerse
Users that are interested in ModaVerse are comparing it to the libraries listed below
Sorting:
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆55Updated 10 months ago
- This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strat…☆78Updated 3 months ago
- ☆54Updated last year
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆46Updated 6 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆56Updated 8 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆23Updated 3 weeks ago
- ☆37Updated 11 months ago
- LMM solved catastrophic forgetting, AAAI2025☆43Updated last month
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆34Updated last year
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆36Updated 3 months ago
- Official implementation of Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More☆23Updated 3 months ago
- Data distillation benchmark☆64Updated this week
- [CVPR 2025] RAP: Retrieval-Augmented Personalization☆56Updated 2 weeks ago
- Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "☆117Updated 2 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆82Updated 5 months ago
- Adapting LLaMA Decoder to Vision Transformer☆28Updated last year
- EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning [🔥The Exploration of R1 for General Audio-Vi…☆31Updated 3 weeks ago
- ☆14Updated 7 months ago
- ☆105Updated 11 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆131Updated last month
- ☆81Updated 2 months ago
- ☆51Updated last year
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆69Updated 7 months ago
- Dataset pruning for ImageNet and LAION-2B.☆79Updated 11 months ago
- MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)☆38Updated last year
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆131Updated last year
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆55Updated 9 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆25Updated 5 months ago
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆74Updated 6 months ago
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆115Updated last month