SkyworkAI / Skywork-R1VLinks
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI, specializing in vision-language reasoning.
☆3,143Updated last month
Alternatives and similar repositories for Skywork-R1V
Users that are interested in Skywork-R1V are comparing it to the libraries listed below
Sorting:
- MiroFlow is an agent framework that enables tool-use agent tasks, featuring a reproducible GAIA score of 82.4%.☆2,180Updated this week
- Align Anything: Training All-modality Model with Feedback☆4,620Updated last month
- Train your Agent model via our easy and efficient framework☆1,687Updated last month
- (ACL-2025 main conference) SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automat…☆314Updated 4 months ago
- [COLM’25] DeepRetrieval — 🔥 Training Search Agent by RLVR with Retrieval Outcome☆693Updated 3 months ago
- Deep Research Agent CognitiveKernel-Pro from Tencent AI Lab. Paper: https://arxiv.org/pdf/2508.00414☆485Updated 3 months ago
- [NIPS'25 Spotlight] Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS☆1,235Updated this week
- Open-source SOTA multi-image editing model☆827Updated last week
- [EMNLP-2024] Build multimodal language agents for fast prototype and production☆2,622Updated 10 months ago
- When Agent Becomes the Scientist – Building Closed-Loop System from Hypothesis to Verification☆837Updated 2 months ago
- adds Sequence Parallelism into LLaMA-Factory☆600Updated 3 months ago
- ☆332Updated 4 months ago
- Matrix-Game 2.0: An Open-Source, Real-Time, and Streaming Interactive World Model☆1,807Updated 3 months ago
- ✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning☆277Updated 8 months ago
- Ola: Pushing the Frontiers of Omni-Modal Language Model☆384Updated 7 months ago
- [NeurIPS 2024] An official implementation of "ShareGPT4Video: Improving Video Understanding and Generation with Better Captions"☆1,083Updated last year
- A collection of multimodal reasoning papers, codes, datasets, benchmarks and resources.☆553Updated last month
- TeleMem is a high-performance drop-in replacement for Mem0, featuring semantic deduplication, long-term dialogue memory, and multimodal v…☆315Updated last week
- 🚀 EvoAgentX: Building a Self-Evolving Ecosystem of AI Agents☆2,466Updated 2 weeks ago
- "Your Fully-Automated Personal AI Assistant"☆1,348Updated 3 months ago
- ScaleCUA is the open-sourced computer use agents that can operate on cross-platform environments (Windows, macOS, Ubuntu, Android).☆1,054Updated 2 weeks ago
- RepoMaster: The open-source AI agent that masters GitHub. It turns any code repository into a powerful tool, achieving a new level of aut…☆470Updated 2 months ago
- An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation☆1,466Updated 3 months ago
- Lumina-DiMOO - An Open-Sourced Multi-Modal Large Diffusion Language Model☆925Updated 3 weeks ago
- Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models☆913Updated 10 months ago
- Turn paper/text/topic into editable research figures, technical route diagrams, and presentation slides.☆964Updated this week
- 🔥 OneThinker: All-in-one Reasoning Model for Image and Video☆376Updated last week
- PromptEnhancer is a prompt-rewriting tool, refining prompts into clearer, structured versions for better image generation.☆2,978Updated 2 months ago
- 【ICML 2025 Spotlight】 Official Repo for Paper ‘’HealthGPT : A Medical Large Vision-Language Model for Unifying Comprehension and Generati…☆1,582Updated 2 months ago
- Complex Reasoning Rag System, Agentic Rag System☆245Updated last month