Osilly / dynamic_llava
The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification".
☆13Updated last month
Alternatives and similar repositories for dynamic_llava:
Users that are interested in dynamic_llava are comparing it to the libraries listed below
- CLIP-MoE: Mixture of Experts for CLIP☆23Updated 3 months ago
- ☆19Updated 2 months ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆32Updated last month
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆17Updated this week
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆54Updated 2 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆51Updated 4 months ago
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆42Updated 5 months ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Updated last year
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆67Updated last month
- Retrieval-Augmented Personalization☆11Updated last month
- ☆22Updated 7 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆58Updated 7 months ago
- ☆92Updated last year
- A Self-Training Framework for Vision-Language Reasoning☆57Updated last month
- [EMNLP 2024 Findings🔥] Official implementation of "LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Infe…☆86Updated 2 months ago
- DEEM: Official implementation of Diffusion models serve as the eyes of large language models for image perception.☆17Updated last month
- AutoHallusion Codebase (EMNLP 2024)☆16Updated last month
- This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strat…☆75Updated 9 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆31Updated this week
- This repo contains code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation"☆10Updated this week
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆18Updated 4 months ago
- Officail Repo of γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆28Updated 2 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"☆28Updated 5 months ago
- ☆25Updated 6 months ago
- ☆47Updated this week
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆39Updated 2 months ago
- MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆31Updated last month
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆16Updated last month
- ✈️ Accelerating Vision Diffusion Transformers with Skip Branches.☆58Updated 3 weeks ago
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆19Updated 3 weeks ago