MetabrainAGI / Awaker2.5-VLLinks
☆35Updated last year
Alternatives and similar repositories for Awaker2.5-VL
Users that are interested in Awaker2.5-VL are comparing it to the libraries listed below
Sorting:
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆43Updated last year
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 6 months ago
- ☆75Updated last year
- ☆56Updated 9 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated last year
- minisora-DiT, a DiT reproduction based on XTuner from the open source community MiniSora☆40Updated last year
- Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos☆65Updated 5 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆129Updated 6 months ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆48Updated 11 months ago
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation☆70Updated 3 months ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated last year
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆213Updated last year
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆100Updated last year
- Offical Code for GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation☆144Updated last year
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆35Updated last year
- Code for "From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios"☆28Updated 7 months ago
- Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)☆171Updated last year
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆37Updated last year
- ☆39Updated 8 months ago
- ☆63Updated 7 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Updated 6 months ago
- [MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆147Updated 6 months ago
- The SAIL-VL2 series model developed by the BytedanceDouyinContent Group☆76Updated 4 months ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆130Updated last year
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆164Updated last year
- ☆148Updated 6 months ago
- ☆95Updated 11 months ago
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆119Updated 6 months ago
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated last year
- ☆132Updated 7 months ago