microsoft / nanovppoLinks
Nano repo for RL training of LLMs
☆70Updated 3 months ago
Alternatives and similar repositories for nanovppo
Users that are interested in nanovppo are comparing it to the libraries listed below
Sorting:
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…☆99Updated 5 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆246Updated 4 months ago
- General Reasoner: Advancing LLM Reasoning Across All Domains [NeurIPS25]☆216Updated 2 months ago
- [NeurIPS 2025] The official repo of SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond☆191Updated 7 months ago
- ☆112Updated last year
- Async pipelined version of Verl☆124Updated 10 months ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆204Updated 2 months ago
- ☆63Updated 7 months ago
- Code for paper "Patch-Level Training for Large Language Models"☆97Updated 2 months ago
- A Comprehensive Survey on Long Context Language Modeling☆226Updated 2 months ago
- LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.☆277Updated 3 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆168Updated last year
- Long Context Extension and Generalization in LLMs☆62Updated last year
- WideSearch: Benchmarking Agentic Broad Info-Seeking☆118Updated 4 months ago
- We aim to provide the best references to search, select, and synthesize high-quality and large-quantity data for post-training your LLMs.☆61Updated last year
- ☆87Updated 5 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆163Updated 9 months ago
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆249Updated 9 months ago
- Fused Qwen3 MoE layer for faster training, compatible with Transformers, LoRA, bnb 4-bit quant, Unsloth. Also possible to train LoRA over…☆231Updated this week
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆139Updated last year
- [ACL 2025 Findings] Autonomous Data Selection with Zero-shot Generative Classifiers for Mathematical Texts (As Huggingface Daily Papers: …☆90Updated 2 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆126Updated last year
- ☆209Updated 3 months ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆42Updated 11 months ago
- ☆85Updated 2 months ago
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆215Updated 4 months ago
- Code for ICML 25 paper "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"☆49Updated 7 months ago
- [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale☆265Updated 7 months ago
- A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning☆283Updated 4 months ago
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆289Updated 3 months ago