ssyze / EVE
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
☆11Updated last year
Alternatives and similar repositories for EVE:
Users that are interested in EVE are comparing it to the libraries listed below
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆50Updated last month
- ☆12Updated 6 months ago
- Recent Advances on MLLM's Reasoning Ability☆25Updated 3 weeks ago
- [NIPS2023]Implementation of Foundation Model is Efficient Multimodal Multitask Model Selector☆36Updated last year
- ☆15Updated last week
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆52Updated 6 months ago
- ☆11Updated 3 months ago
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Updated last year
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"☆17Updated 6 months ago
- [ICLR 2025] Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality☆24Updated 3 weeks ago
- CLIP-MoE: Mixture of Experts for CLIP☆32Updated 6 months ago
- ☆12Updated last year
- [CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…☆39Updated 4 months ago
- [CVPR 2024 Highlight] ImageNet-D☆42Updated 6 months ago
- Official implementation of MC-LLaVA.☆25Updated 3 months ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆39Updated 5 months ago
- ☆10Updated 3 weeks ago
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆49Updated 5 months ago
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆25Updated 6 months ago
- Official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning"☆29Updated 2 months ago
- An Enhanced CLIP Framework for Learning with Synthetic Captions☆28Updated 2 weeks ago
- Visual self-questioning for large vision-language assistant.☆41Updated 7 months ago
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆18Updated 3 months ago
- Collection of awesome Continual Test-Time Adaptation methods☆17Updated 11 months ago
- 🔎Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".☆33Updated last month
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Updated last year
- LEO: A powerful Hybrid Multimodal LLM☆18Updated 3 months ago
- ☆16Updated 8 months ago
- [CVPR 2025] CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning☆12Updated 2 weeks ago
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆46Updated 6 months ago