convergence-ai / lm2
Official repo of paper LM2
☆34Updated last month
Alternatives and similar repositories for lm2:
Users that are interested in lm2 are comparing it to the libraries listed below
- ☆111Updated last month
- ☆74Updated 7 months ago
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆83Updated last week
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆28Updated last week
- ☆185Updated last month
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆80Updated last month
- A repository for research on medium sized language models.☆76Updated 10 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆44Updated last month
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆86Updated 5 months ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆23Updated last week
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆59Updated last year
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆78Updated 3 weeks ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 6 months ago
- The official implementation of Self-Exploring Language Models (SELM)☆62Updated 9 months ago
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆124Updated 2 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆85Updated last week
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆74Updated 2 weeks ago
- working implimention of deepseek MLA☆38Updated 2 months ago
- ☆16Updated 2 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆168Updated 2 months ago
- ☆87Updated 6 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆51Updated this week
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆95Updated 2 months ago
- ☆39Updated this week
- Official PyTorch Implementation for Task Vectors are Cross-Modal☆22Updated 3 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆83Updated last week
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆75Updated 3 weeks ago
- ☆59Updated 3 months ago
- ☆35Updated last month
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆148Updated last week