xinke-wang / ModaVerse
[CVPR2024] ModaVerse: Efficiently Transforming Modalities with LLMs
☆29Updated 9 months ago
Alternatives and similar repositories for ModaVerse:
Users that are interested in ModaVerse are comparing it to the libraries listed below
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆56Updated 9 months ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆57Updated 7 months ago
- ☆99Updated 9 months ago
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆73Updated 5 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆58Updated 4 months ago
- CLIP-MoE: Mixture of Experts for CLIP☆31Updated 6 months ago
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆32Updated last year
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆49Updated last month
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆57Updated 4 months ago
- SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models☆92Updated this week
- LMM solved catastrophic forgetting, AAAI2025☆40Updated last week
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆43Updated 5 months ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆126Updated 11 months ago
- This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strat…☆78Updated 2 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆37Updated 6 months ago
- MIO: A Foundation Model on Multimodal Tokens☆25Updated 4 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆76Updated 7 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆132Updated 5 months ago
- Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "☆112Updated 2 weeks ago
- [NeurIPS'24 Oral] HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning☆188Updated 4 months ago
- ☆73Updated last month
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆22Updated 8 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆34Updated last year
- ☆34Updated 9 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆46Updated this week
- V1: Toward Multimodal Reasoning by Designing Auxiliary Task☆33Updated last week
- (ICLR2025 Spotlight) DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.☆29Updated last month
- ☆54Updated last year
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆84Updated 6 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Updated last year