XMUDeepLIT/AVG-LLaVA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/XMUDeepLIT/AVG-LLaVA)

XMUDeepLIT / AVG-LLaVA

Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"

☆33

Alternatives and similar repositories for AVG-LLaVA

Users that are interested in AVG-LLaVA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

UCSC-VLAA / CLIPS
View on GitHub
An Enhanced CLIP Framework for Learning with Synthetic Captions
☆40Apr 18, 2025Updated last year
DripNowhy / ETA
View on GitHub
[ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"
☆34Jul 20, 2025Updated last year
mrwu-mac / ControlMLLM
View on GitHub
[NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'
☆210Jul 17, 2025Updated last year
YuxiXie / V-DPO
View on GitHub
Preference Learning for LLaVA
☆60Nov 9, 2024Updated last year
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆183Oct 14, 2024Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
lizhou-cs / mglmm
View on GitHub
☆32Jun 14, 2026Updated last month
itsvaibhav01 / Immune
View on GitHub
[CVPR2025] Official Repository for IMMUNE: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
☆28Jun 11, 2025Updated last year
CircleRadon / TokenPacker
View on GitHub
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
☆279May 26, 2025Updated last year
YiyangZhou / CSR
View on GitHub
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆87Oct 26, 2025Updated 8 months ago
Yxxxb / VoCo-LLaMA
View on GitHub
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆205Jun 18, 2025Updated last year
callsys / ControlCap
View on GitHub
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆81Oct 25, 2024Updated last year
yfzhang114 / SliME
View on GitHub
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
☆163Dec 26, 2024Updated last year
baaivision / DenseFusion
View on GitHub
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆159Dec 6, 2024Updated last year
Lackel / AGLA
View on GitHub
[CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
☆68Jul 16, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Vinoground / Vinoground
View on GitHub
☆13Apr 13, 2026Updated 3 months ago
WPR001 / Ego-ST
View on GitHub
☆16Sep 25, 2025Updated 9 months ago
MrZilinXiao / AutoVER
View on GitHub
[ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.
☆14Mar 2, 2024Updated 2 years ago
iancovert / locality-alignment
View on GitHub
☆55Jan 17, 2025Updated last year
yuecao0119 / MMInstruct
View on GitHub
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆64Nov 7, 2024Updated last year
princeton-pli / VLM_S2H
View on GitHub
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
☆19Jun 3, 2025Updated last year
lloongx / DIKI
View on GitHub
[ECCV 2024] Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
☆57Jul 9, 2024Updated 2 years ago
deepcs233 / Visual-CoT
View on GitHub
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …
☆447Dec 22, 2024Updated last year
opendatalab / HA-DPO
View on GitHub
Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
☆104Jan 30, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ShuvenduRoy / CoPrompt
View on GitHub
[ICLR'24] Consistency-guided Prompt Learning for Vision-Language Models
☆86May 24, 2024Updated 2 years ago
Heidelberg-NLP / CC-SHAP-VLM
View on GitHub
Official code implementation for the paper "Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Expl…
☆12Jul 14, 2026Updated last week
yaolinli / DeCo
View on GitHub
Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models
☆80Jul 14, 2025Updated last year
mbzuai-oryx / VideoGLaMM
View on GitHub
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
☆104Apr 14, 2025Updated last year
m1k2zoo / negbench
View on GitHub
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆47Feb 26, 2026Updated 4 months ago
KD-TAO / DyCoke
View on GitHub
[CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
☆113Nov 22, 2025Updated 8 months ago
jiaangli / VILA
View on GitHub
[TACL/EMNLP'24] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study
☆16Nov 22, 2024Updated last year
MCG-NJU / p-MoD
View on GitHub
[ICCV 2025] p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
☆44Jun 26, 2025Updated last year
zhengyuan-xie / ECCV24_NeST
View on GitHub
[ECCV 2024] Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation
☆39Mar 3, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
hulianyuyy / AdaptSign
View on GitHub
Improving Continuous Sign Language Recognition with Adapted Image Models
☆16Nov 10, 2025Updated 8 months ago
snumprlab / isr-dpo
View on GitHub
Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)
☆23Nov 25, 2025Updated 7 months ago
uvavision / SyViC
View on GitHub
[ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data
☆13Sep 30, 2023Updated 2 years ago
yuezih / less-is-more
View on GitHub
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)
☆58Oct 28, 2024Updated last year
nicolay-r / bulk-chain
View on GitHub
A no-string API framework for deploying schema-based reasoning into third-party apps
☆23Jul 10, 2026Updated last week
huang-yh / Owl
View on GitHub
☆52Dec 13, 2024Updated last year
learning3d / assignment3
View on GitHub
Volume Rendering, Neural Radiance Fields, Neural Surfaces
☆25Oct 21, 2025Updated 9 months ago