PKU-Alignment / align-anythingLinks

Align Anything: Training All-modality Model with Feedback

☆4,601

Alternatives and similar repositories for align-anything

Users that are interested in align-anything are comparing it to the libraries listed below

Sorting:

EvolvingLMMs-Lab / lmms-eval
One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks
☆3,339Updated last week
Yuliang-Liu / Monkey
Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models (CVPR 2024 Highlight)
☆1,936Updated last month
Simple-Efficient / RL-Factory
Train your Agent model via our easy and efficient framework
☆1,642Updated this week
SkyworkAI / Skywork-R1V
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning.
☆3,123Updated this week
om-ai-lab / OmAgent
Build multimodal language agents for fast prototype and production
☆2,600Updated 8 months ago
dhcode-cpp / X-R1
minimal-cost for training 0.5B R1-Zero
☆790Updated 6 months ago
HITsz-TMG / Uni-MoE
Uni-MoE: Lychee's Large Multimodal Model Family.
☆1,039Updated last week
PKU-YuanGroup / Machine-Mindset
An MBTI Exploration of Large Language Models
☆513Updated last year
ModalMinds / MM-EUREKA
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
☆763Updated 3 months ago
Qihoo360 / 360-LLaMA-Factory
adds Sequence Parallelism into LLaMA-Factory
☆598Updated last month
HKUDS / VideoRAG
[KDD'2026] "VideoRAG: Chat with Your Videos"
☆1,346Updated 2 weeks ago
luo-junyu / Awesome-Agent-Papers
[Up-to-date] Large Language Model Agent: A Survey on Methodology, Applications and Challenges
☆2,218Updated last month
Ola-Omni / Ola
Ola: Pushing the Frontiers of Omni-Modal Language Model
☆380Updated 5 months ago
gpt-omni / mini-omni2
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
☆1,842Updated 10 months ago
ShareGPT4Omni / ShareGPT4Video
[NeurIPS 2024] An official implementation of "ShareGPT4Video: Improving Video Understanding and Generation with Better Captions"
☆1,079Updated last year
pat-jj / DeepRetrieval
[COLM’25] DeepRetrieval — 🔥 The First Search Agent Trained by On-Policy Reinforcement Learning
☆677Updated last month
EvolvingLMMs-Lab / open-r1-multimodal
A fork to add multimodal model training to open-r1
☆1,426Updated 10 months ago
showlab / Show-o
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
☆1,803Updated last month
hiyouga / EasyR1
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
☆4,204Updated this week
ZiyuGuo99 / Image-Generation-CoT
[CVPR 2025] The First Investigation of CoT Reasoning (RL, TTS, Reflection) in Image Generation
☆841Updated 6 months ago
HJYao00 / Mulberry
[NIPS'25 Spotlight] Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS
☆1,227Updated 2 months ago
MiroMindAI / MiroThinker
MiroThinker is a series of open-source agentic models trained for deep research and complex tool use scenarios.
☆1,266Updated this week
Coobiw / MPP-LLaVA
Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conv…
☆532Updated 9 months ago
yfzhang114 / r1_reward
✨✨R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
☆271Updated 7 months ago
VITA-MLLM / VITA-Audio
✨✨[NeurIPS 2025] VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
☆667Updated 6 months ago
XueZeyue / DanceGRPO
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
☆1,325Updated last month
Fancy-MLLM / R1-Onevision
R1-onevision, a visual language model capable of deep CoT reasoning.
☆572Updated 7 months ago
jingyi0000 / VLM_survey
Collection of AWESOME vision-language models for vision tasks
☆3,024Updated last month
VITA-MLLM / VITA
✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
☆2,457Updated 8 months ago
MoonshotAI / Kimi-VL
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
☆1,119Updated 4 months ago