microsoft / nanovppoLinks
Nano repo for RL training of LLMs
☆66Updated this week
Alternatives and similar repositories for nanovppo
Users that are interested in nanovppo are comparing it to the libraries listed below
Sorting:
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…☆68Updated 2 months ago
- ☆83Updated 2 months ago
- A Comprehensive Survey on Long Context Language Modeling☆197Updated 3 months ago
- Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization☆77Updated last month
- [COLM 2025] An Open Math Pre-trainng Dataset with 370B Tokens.☆106Updated 6 months ago
- [NeurIPS 2025] The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆170Updated 3 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆247Updated 6 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆165Updated last year
- Async pipelined version of Verl☆123Updated 6 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆236Updated last month
- Long Context Extension and Generalization in LLMs☆62Updated last year
- ☆108Updated last year
- Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆149Updated 7 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]☆186Updated 4 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆137Updated last year
- ☆90Updated 5 months ago
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆261Updated last month
- WideSearch: Benchmarking Agentic Broad Info-Seeking☆96Updated 3 weeks ago
- [ICML 2025] Predictive Data Selection: The Data That Predicts Is the Data That Teaches☆56Updated 7 months ago
- ☆55Updated 4 months ago
- Fused Qwen3 MoE layer for faster training, compatible with HF Transformers, LoRA, 4-bit quant, Unsloth☆197Updated last week
- ☆100Updated last month
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆191Updated 3 weeks ago
- [EMNLP'2025 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆66Updated 6 months ago
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]☆178Updated 3 months ago
- A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning☆264Updated last month
- ☆106Updated 3 months ago
- ☆85Updated 9 months ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆218Updated 3 months ago
- ☆205Updated this week