SkyworkAI / Skywork-R1VLinks
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.
☆3,149Updated last month
Alternatives and similar repositories for Skywork-R1V
Users that are interested in Skywork-R1V are comparing it to the libraries listed below
Sorting:
- Align Anything: Training All-modality Model with Feedback☆4,631Updated 2 months ago
- [EMNLP-2024] Build multimodal language agents for fast prototype and production☆2,624Updated 10 months ago
- MiroFlow is an agent framework that enables tool-use agent tasks, featuring a reproducible GAIA score of 82.4%.☆2,381Updated last week
- Open-source SOTA multi-image editing model☆850Updated last week
- When Agent Becomes the Scientist – Building Closed-Loop System from Hypothesis to Verification☆849Updated 2 months ago
- [NIPS'25 Spotlight] Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS☆1,238Updated 3 weeks ago
- Deep Research Agent CognitiveKernel-Pro from Tencent AI Lab. Paper: https://arxiv.org/pdf/2508.00414☆491Updated 3 months ago
- TeleMem is a high-performance drop-in replacement for Mem0, featuring semantic deduplication, long-term dialogue memory, and multimodal v…☆385Updated last week
- (ACL-2025 main conference) SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automat…☆316Updated 5 months ago
- Uni-MoE: Lychee's Large Multimodal Model Family.☆1,076Updated last month
- 🚀 EvoAgentX: Building a Self-Evolving Ecosystem of AI Agents☆2,507Updated last month
- Train your Agent model via our easy and efficient framework☆1,697Updated 2 months ago
- ✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning☆281Updated 8 months ago
- [COLM’25] DeepRetrieval — 🔥 Training Search Agent by RLVR with Retrieval Outcome☆695Updated 3 months ago
- A collection of multimodal reasoning papers, codes, datasets, benchmarks and resources.☆562Updated last month
- Ola: Pushing the Frontiers of Omni-Modal Language Model☆386Updated 7 months ago
- [NeurIPS 2024] An official implementation of "ShareGPT4Video: Improving Video Understanding and Generation with Better Captions"☆1,084Updated last year
- Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models☆913Updated 10 months ago
- [NeurIPS 2025] A Graph-based LLM Framework for Real-world SE Tasks☆518Updated 4 months ago
- 【ICML 2025 Spotlight】 Official Repo for Paper ‘’HealthGPT : A Medical Large Vision-Language Model for Unifying Comprehension and Generati…☆1,588Updated 3 months ago
- ☆334Updated 5 months ago
- adds Sequence Parallelism into LLaMA-Factory☆603Updated 3 months ago
- 🔥 OneThinker: All-in-one Reasoning Model for Image and Video☆388Updated 3 weeks ago
- ScaleCUA is the open-sourced computer use agents that can operate on cross-platform environments (Windows, macOS, Ubuntu, Android).☆1,067Updated last month
- A general-purpose API load testing platform that supports LLM services and business HTTP interfaces, enabling one-click performance testi…☆171Updated this week
- RepoMaster: The open-source AI agent that masters GitHub. It turns any code repository into a powerful tool, achieving a new level of aut…☆490Updated 3 months ago
- Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model☆1,822Updated 4 months ago
- LightAgent: Lightweight AI agent framework with memory, tools & tree-of-thought. Supports multi-agent collaboration, self-learning, and m…☆497Updated 2 weeks ago
- [Up-to-date] Large Language Model Agent: A Survey on Methodology, Applications and Challenges☆2,402Updated 2 months ago
- Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)☆1,947Updated last week