QwenLM / QwQ
QwQ is the reasoning model series developed by Qwen team, Alibaba Cloud.
β484Updated last month
Alternatives and similar repositories for QwQ:
Users that are interested in QwQ are comparing it to the libraries listed below
- DeepRetrieval - Hacking π₯Real Search Engines and Retrievers with LLM via RLβ478Updated 3 weeks ago
- adds Sequence Parallelism into LLaMA-Factoryβ471Updated last week
- PaSa -- an advanced paper search agent powered by large language models. It can autonomously make a series of decisions, including invokiβ¦β1,141Updated 3 months ago
- minimal-cost for training 0.5B R1-Zeroβ714Updated 2 weeks ago
- β739Updated 2 weeks ago
- [NeurIPS 2024] BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Modelsβ257Updated last month
- β¨ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framworkβ205Updated last month
- The official implementation of Self-Play Preference Optimization (SPPO)β545Updated 3 months ago
- Codebase for Iterative DPO Using Rule-based Rewardsβ243Updated 3 weeks ago
- Recipes to train the self-rewarding reasoning LLMs.β214Updated 2 months ago
- ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learningβ808Updated last week
- Align Anything: Training All-modality Model with Feedbackβ3,611Updated last week
- β155Updated 2 weeks ago
- π GPTSwarm: LLM agents as (Optimizable) Graphsβ837Updated 4 months ago
- Skywork-R1V2 : Multimodal Hybrid Reinforcement Learning for Reasoning(ζε₯½ηε€ζ¨‘ζζ¨η)β2,404Updated last week
- Unified KV Cache Compression Methods for Auto-Regressive Modelsβ1,039Updated 4 months ago
- Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".β253Updated 2 months ago
- Scaling Deep Research via Reinforcement Learning in Real-world Environments.β312Updated 3 weeks ago
- Large Reasoning Modelsβ804Updated 5 months ago
- Easiest and laziest way for building multi-agent LLMs applications.β1,700Updated last week
- β679Updated 3 weeks ago
- A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoningβ180Updated this week
- Mulberry, an o1-like Reasoning and Reflection MLLM Implemented via Collective MCTSβ1,173Updated last month
- Muon is Scalable for LLM Trainingβ1,039Updated last month
- AN O1 REPLICATION FOR CODINGβ333Updated 4 months ago
- A Comprehensive Benchmark for Code Information Retrieval.β85Updated last month
- Collect every awesome work about r1!β356Updated last week
- Unleashing the Power of Reinforcement Learning for Math and Code Reasonersβ540Updated 2 weeks ago
- VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)β455Updated 2 weeks ago
- Build multimodal language agents for fast prototype and productionβ2,479Updated last month