friedrichor / LLaVA-NeXT-ReproducedLinks

Reproduced LLaVA-NeXT with training code and scripts.

☆10

Alternatives and similar repositories for LLaVA-NeXT-Reproduced

Users that are interested in LLaVA-NeXT-Reproduced are comparing it to the libraries listed below

Sorting:

LanqingL / SCS
"Visual Prompt Selection for In-Context Learning Segmentation Framework"
☆15Updated 7 months ago
Meituan-AutoML / Lenna
☆86Updated last year
SY-Xuan / Pink
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
☆91Updated 6 months ago
eric-ai-lab / GRIT
Official code for paper "GRIT: Teaching MLLMs to Think with Images"
☆112Updated last week
alibaba / conv-llava
☆118Updated 11 months ago
RupertLuo / VoCoT
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
☆71Updated last year
Liuziyu77 / RAR
The official implementation of RAR
☆88Updated last year
Hon-Wong / Elysium
[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
☆78Updated 8 months ago
maifoundations / Visionary-R1
Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning
☆30Updated 2 weeks ago
AFeng-x / Draw-and-Understand
[ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
☆82Updated last month
yuecao0119 / MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆56Updated 8 months ago
JierunChen / Ref-L4
Evaluation code for Ref-L4, a new REC benchmark in the LMM era
☆38Updated 6 months ago
ucasyjz / VIP
[ACCV 2024 Poster] official code for "VIP: Versatile Image Outpainting Empowered by Multimodal Large Language Model"
☆9Updated 9 months ago
yuecao0119 / MMInstruct
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆55Updated 8 months ago
LALBJ / PAI
[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs
☆130Updated 8 months ago
SliMM-X / CoMP-MM
Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"
☆29Updated 3 months ago
LaVi-Lab / Visual-Table
[EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"
☆20Updated 9 months ago
google / haloquest
☆20Updated 11 months ago
RUCAIBox / ComVint
The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…
☆19Updated last year
yayafengzi / LMM-HiMTok
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
☆52Updated this week
MME-Benchmarks / MME-RealWorld
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
☆128Updated 4 months ago
JieShibo / MemVP
[ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
☆49Updated last year
wusize / F-LMM
[CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models
☆98Updated last month
yu-rp / VisualPerceptionToken
☆89Updated 4 months ago
whwu95 / FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
☆60Updated last year
ligeng0197 / Awesome-Thinking-With-Images
Latest open-source "Thinking with images" (O3/O4-mini) papers, covering training-free, SFT-based, and RL-enhanced methods for "fine-grain…
☆75Updated 2 weeks ago
Jiaxing-star / LLaVA-Octopus
☆11Updated 6 months ago
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆171Updated 9 months ago
appletea233 / Temporal-R1
Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency
☆44Updated last month
Cooperx521 / PyramidDrop
(CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
☆116Updated 4 months ago