Alpha-Innovator / Dolphin
☆16Updated last month
Alternatives and similar repositories for Dolphin
Users that are interested in Dolphin are comparing it to the libraries listed below
Sorting:
- ☆16Updated 3 weeks ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆84Updated last week
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆34Updated 4 months ago
- The code implementation of Symbolic-MoE☆31Updated 2 months ago
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆189Updated last month
- [ICLR2025 Oral] ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding☆78Updated last month
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆35Updated 3 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆66Updated 11 months ago
- ☆75Updated 4 months ago
- ☆26Updated last month
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆43Updated 2 months ago
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆73Updated last month
- ☆97Updated last month
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆40Updated last week
- ☆38Updated last week
- Official implementation of paper "ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting" (CVPR 2025)☆41Updated last month
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆54Updated last week
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆90Updated last week
- ☆33Updated 3 months ago
- [ECCV 2024] Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?☆160Updated 2 weeks ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆56Updated 6 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆100Updated 2 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆77Updated this week
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆94Updated 2 months ago
- ☆41Updated 4 months ago
- Official implementation of the Law of Vision Representation in MLLMs☆155Updated 6 months ago
- ☆12Updated this week
- ☆14Updated 4 months ago
- A Self-Training Framework for Vision-Language Reasoning☆78Updated 3 months ago
- ☆27Updated this week