QwenLM / QwQLinks
QwQ is the reasoning model series developed by Qwen team, Alibaba Cloud.
☆501Updated 2 months ago
Alternatives and similar repositories for QwQ
Users that are interested in QwQ are comparing it to the libraries listed below
Sorting:
- Moxin is a family of fully open-source and reproducible LLMs☆588Updated last month
- Train your Agent model via our easy and efficient framework☆776Updated this week
- DeepRetrieval - 🔥 Training Search Agent with Retrieval Outcomes via Reinforcement Learning☆521Updated this week
- Official Repository of Cooragent☆1,306Updated this week
- ☆773Updated last month
- adds Sequence Parallelism into LLaMA-Factory☆498Updated this week
- ✨ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork☆226Updated 2 months ago
- PaSa -- an advanced paper search agent powered by large language models. It can autonomously make a series of decisions, including invoki…☆1,182Updated this week
- verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…☆232Updated this week
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".☆261Updated 3 months ago
- VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)☆473Updated last month
- 🐝 The First Graph Agentic Framework with RL and Prompt Optimization☆876Updated 4 months ago
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆345Updated 2 weeks ago
- ☆223Updated this week
- minimal-cost for training 0.5B R1-Zero☆730Updated 2 weeks ago
- [NeurIPS 2024] Official code for HourVideo: 1-Hour Video Language Understanding☆149Updated 2 months ago
- Unified KV Cache Compression Methods for Auto-Regressive Models☆1,098Updated 4 months ago
- Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’☆1,930Updated last week
- R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization☆392Updated last month
- Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTS☆1,185Updated 2 months ago
- Unleashing the Power of Reinforcement Learning for Math and Code Reasoners☆607Updated this week
- TTRL: Test-Time Reinforcement Learning☆570Updated last week
- Align Anything: Training All-modality Model with Feedback☆3,814Updated this week
- Recipes to train the self-rewarding reasoning LLMs.☆219Updated 3 months ago
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities☆876Updated last month
- ☆208Updated last week
- The official implementation of Self-Play Preference Optimization (SPPO)☆563Updated 4 months ago
- Codebase for Iterative DPO Using Rule-based Rewards☆245Updated last month
- [NeurIPS 2024] BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models☆258Updated 2 months ago
- Build multimodal language agents for fast prototype and production☆2,491Updated 2 months ago