convergence-ai / lm2Links
Official repo of paper LM2
☆44Updated 7 months ago
Alternatives and similar repositories for lm2
Users that are interested in lm2 are comparing it to the libraries listed below
Sorting:
- Esoteric Language Models☆99Updated 2 months ago
- ☆123Updated 7 months ago
- ☆85Updated last year
- ☆105Updated last year
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆34Updated last month
- ☆27Updated 3 months ago
- [EMNLP 2025 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆64Updated 5 months ago
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆40Updated this week
- ☆21Updated last week
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆129Updated last month
- Sotopia-RL: Reward Design for Social Intelligence☆39Updated last month
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆41Updated last month
- ☆33Updated 8 months ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆100Updated last month
- SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning☆151Updated 2 weeks ago
- Code for "Reasoning to Learn from Latent Thoughts"☆119Updated 6 months ago
- ☆217Updated 7 months ago
- A repository for research on medium sized language models.☆78Updated last year
- Process Reward Models That Think☆53Updated 3 months ago
- [NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study☆54Updated 10 months ago
- Replicating O1 inference-time scaling laws☆90Updated 10 months ago
- ☆72Updated last year
- Reinforcing General Reasoning without Verifiers☆87Updated 3 months ago
- The official implementation of Self-Exploring Language Models (SELM)☆64Updated last year
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆106Updated 2 months ago
- Natural Language Reinforcement Learning☆97Updated 2 months ago
- This is the official repository for Inheritune.☆113Updated 7 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆59Updated last year
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆93Updated 4 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Updated 6 months ago