MetabrainAGI / Awaker2.5-VLLinks

☆35

Alternatives and similar repositories for Awaker2.5-VL

Users that are interested in Awaker2.5-VL are comparing it to the libraries listed below

Sorting:

SparksJoe / Prism
A Framework for Decoupling and Assessing the Capabilities of VLMs
☆43Updated last year
EvolvingLMMs-Lab / MGPO
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
☆52Updated 6 months ago
FudanNLPLAB / MouSi
☆75Updated last year
mengcye / LAION-SG
☆56Updated 9 months ago
xmu-xiaoma666 / Multimodal-Open-O1
Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…
☆29Updated last year
mini-sora / MiniSora-DiT
minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora
☆40Updated last year
EvolvingLMMs-Lab / VideoMMMU
Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos
☆65Updated 5 months ago
yihedeng9 / OpenVLThinker
OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement
☆129Updated 6 months ago
shulin16 / MMInA
[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
☆48Updated 11 months ago
SHI-Labs / VisPer-LM
[NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation
☆70Updated 3 months ago
WeihuangLin / INF-LLaVA
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
☆42Updated last year
FreedomIntelligence / LongLLaVA
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆213Updated last year
MBZUAI-LLM / web2code
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
☆100Updated last year
gpt4video / GPT4Video
Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation
☆144Updated last year
jihaonew / MM-Instruct
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
☆35Updated last year
xcltql666 / DenseDiT
Code for "From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios"
☆28Updated 7 months ago
sterzhang / image-textualization
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)
☆171Updated last year
beichenzbc / BoostStep
official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"
☆37Updated last year
SJTU-DENG-Lab / UniCMs
☆39Updated 8 months ago
Tiezheng11 / Vision-Language-Vision
☆63Updated 7 months ago
OpenGVLab / TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆64Updated 6 months ago
agents-x-project / PyVision
[MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."
☆147Updated 6 months ago
BytedanceDouyinContent / SAIL-VL2
The SAIL-VL2 series model developed by the BytedanceDouyinContent Group
☆76Updated 4 months ago
cnzzx / VSA
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
☆130Updated last year
yfzhang114 / SliME
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
☆164Updated last year
HumanMLLM / HumanOmniV2
☆148Updated 6 months ago
rotem-shalev / ImageRAG
☆95Updated 11 months ago
Vchitect / Evaluation-Agent
[ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible
☆119Updated 6 months ago
DCDmllm / HyperLLaVA
Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models
☆28Updated last year
bytedance / ContentV
☆132Updated 7 months ago