yfzhang114 / SliME
View external linksLinks

✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models

☆164

Alternatives and similar repositories for SliME

Users that are interested in SliME are comparing it to the libraries listed below

Sorting:

MME-Benchmarks / MME-RealWorld
View on GitHub
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
☆151Oct 21, 2025Updated 3 months ago
thunlp / LLaVA-UHD
View on GitHub
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
☆413Dec 20, 2025Updated last month
ParadoxZW / LLaVA-UHD-Better
View on GitHub
A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo
☆34Aug 12, 2024Updated last year
LaVi-Lab / Visual-Table
View on GitHub
[EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"
☆20Oct 17, 2024Updated last year
baaivision / EVE
View on GitHub
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆368Jul 24, 2025Updated 6 months ago
PhoenixZ810 / MG-LLaVA
View on GitHub
Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).
☆159Sep 27, 2024Updated last year
bfshi / scaling_on_scales
View on GitHub
When do we not need larger vision models?
☆412Feb 8, 2025Updated last year
alibaba / conv-llava
View on GitHub
☆124Jul 29, 2024Updated last year
MMStar-Benchmark / MMStar
View on GitHub
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
☆203Sep 26, 2024Updated last year
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,562Sep 14, 2025Updated 5 months ago
OpenGVLab / OmniCorpus
View on GitHub
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
☆412May 5, 2025Updated 9 months ago
luogen1996 / LLaVA-HR
View on GitHub
[ICLR2025] LLaVA-HR: High-Resolution Large Language-Vision Assistant
☆246Aug 14, 2024Updated last year
RLHF-V / RLHF-V
View on GitHub
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
☆306Sep 11, 2024Updated last year
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆1,985Nov 7, 2025Updated 3 months ago
CircleRadon / TokenPacker
View on GitHub
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
☆276May 26, 2025Updated 8 months ago
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆180Oct 14, 2024Updated last year
XMUDeepLIT / AVG-LLaVA
View on GitHub
Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"
☆33Oct 12, 2024Updated last year
Kwai-YuanQi / MM-RLHF
View on GitHub
The Next Step Forward in Multimodal LLM Alignment
☆197May 1, 2025Updated 9 months ago
FreedomIntelligence / ALLaVA
View on GitHub
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
☆281Jun 25, 2024Updated last year
Yxxxb / VoCo-LLaMA
View on GitHub
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆204Jun 18, 2025Updated 7 months ago
RLHF-V / RLAIF-V
View on GitHub
[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
☆443May 14, 2025Updated 9 months ago
MME-Benchmarks / MME-Unify
View on GitHub
✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆43Apr 10, 2025Updated 10 months ago
MengLcool / DeepStack-VL
View on GitHub
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆79Jun 17, 2024Updated last year
Han-Zongbo / Skip-n
View on GitHub
This repository contains the code of our paper 'Skip \n: A simple method to reduce hallucination in Large Vision-Language Models'.
☆15Feb 12, 2024Updated 2 years ago
hwanyu112 / Latent-Sketchpad
View on GitHub
☆64Feb 1, 2026Updated 2 weeks ago
TIGER-AI-Lab / Mantis
View on GitHub
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024 Best Paper]
☆239Jan 3, 2026Updated last month
SHI-Labs / CuMo
View on GitHub
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
☆162Jun 8, 2024Updated last year
DCDmllm / HyperLLaVA
View on GitHub
Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
☆28Mar 22, 2024Updated last year
swordlidev / Efficient-Multimodal-LLMs-Survey
View on GitHub
Efficient Multimodal Large Language Models: A Survey
☆387Apr 29, 2025Updated 9 months ago
WeihuangLin / INF-LLaVA
View on GitHub
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
☆42Aug 4, 2024Updated last year
yfzhang114 / LLaVA-Align
View on GitHub
[ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…
☆82Feb 22, 2025Updated 11 months ago
42Shawn / LLaVA-PruMerge
View on GitHub
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
☆163Sep 27, 2025Updated 4 months ago
EvolvingLMMs-Lab / LongVA
View on GitHub
Long Context Transfer from Language to Vision
☆400Mar 18, 2025Updated 10 months ago
dongyh20 / Insight-V
View on GitHub
[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
☆233Nov 7, 2025Updated 3 months ago
locuslab / llava-token-compression
View on GitHub
☆46Nov 8, 2024Updated last year
UMass-Embodied-AGI / FlexAttention
View on GitHub
[ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models
☆46Jan 8, 2025Updated last year
SJTU-DENG-Lab / UniCMs
View on GitHub
☆39May 20, 2025Updated 8 months ago
yfzhang114 / Awesome-Multimodal-Large-Language-Models
View on GitHub
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
☆925Jan 31, 2026Updated 2 weeks ago
RunpeiDong / DreamLLM
View on GitHub
[ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation
☆458Dec 2, 2024Updated last year

yfzhang114 / SliMEView external linksLinks

Alternatives and similar repositories for SliME

yfzhang114 / SliME
View external linksLinks