MoonshotAI / Kimi-VL
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
☆819Updated 2 weeks ago
Alternatives and similar repositories for Kimi-VL:
Users that are interested in Kimi-VL are comparing it to the libraries listed below
- Explore the Multimodal “Aha Moment” on 2B Model☆583Updated last month
- A fork to add multimodal model training to open-r1☆1,245Updated 3 months ago
- Muon is Scalable for LLM Training☆1,039Updated last month
- R1-onevision, a visual language model capable of deep CoT reasoning.☆513Updated 3 weeks ago
- ☆739Updated 2 weeks ago
- MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning☆590Updated this week
- Next-Token Prediction is All You Need☆2,111Updated last month
- Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.☆742Updated this week
- This is the first paper to explore how to effectively use RL for MLLMs and introduce Vision-R1, a reasoning MLLM that leverages cold-sta…☆540Updated 3 weeks ago
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆900Updated last month
- Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]☆489Updated last week
- EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL☆2,258Updated this week
- An Open-source RL System from ByteDance Seed and Tsinghua AIR☆1,198Updated 3 weeks ago
- Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey☆542Updated 2 weeks ago
- Official Repo for Open-Reasoner-Zero☆1,904Updated last month
- Dream 7B, a large diffusion language model☆613Updated this week
- ☆857Updated last month
- Understanding R1-Zero-Like Training: A Critical Perspective☆908Updated 3 weeks ago
- ☆679Updated 3 weeks ago
- Large Reasoning Models☆804Updated 5 months ago
- MoBA: Mixture of Block Attention for Long-Context LLMs☆1,768Updated last month
- ☆358Updated 3 months ago
- Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation☆756Updated 9 months ago
- Tarsier -- a family of large-scale video-language models, which is designed to generate high-quality video descriptions , together with g…☆361Updated 2 weeks ago
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆253Updated 2 months ago
- [Survey] Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey☆422Updated 3 months ago
- Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊☆267Updated 3 months ago
- Official repository of ’Visual-RFT: Visual Reinforcement Fine-Tuning’☆1,627Updated 3 weeks ago
- An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.☆697Updated last week
- Rethinking Step-by-step Visual Reasoning in LLMs☆292Updated 3 months ago