long-horizon-execution / measuring-executionLinks
☆44Updated last month
Alternatives and similar repositories for measuring-execution
Users that are interested in measuring-execution are comparing it to the libraries listed below
Sorting:
- ☆215Updated 2 weeks ago
- Esoteric Language Models☆103Updated 3 weeks ago
- ☆29Updated 4 months ago
- The official github repo for "Diffusion Language Models are Super Data Learners".☆139Updated last week
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆101Updated 2 months ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆347Updated 4 months ago
- Official repo of paper LM2☆47Updated 8 months ago
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆53Updated 2 weeks ago
- This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"☆229Updated this week
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆79Updated 7 months ago
- ☆33Updated 9 months ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆222Updated last month
- Code for the paper: "Learning to Reason without External Rewards"☆369Updated 3 months ago
- ☆40Updated 5 months ago
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆129Updated 2 months ago
- [EMNLP'2025 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆66Updated 6 months ago
- Code for paper called Self-Training Elicits Concise Reasoning in Large Language Models☆42Updated 6 months ago
- minimal GRPO implementation from scratch☆98Updated 7 months ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆108Updated 4 months ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆93Updated 5 months ago
- The code repository of the paper: Competition and Attraction Improve Model Fusion☆163Updated 2 months ago
- ☆52Updated last year
- The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.☆51Updated last year
- ☆65Updated 7 months ago
- SSRL: Self-Search Reinforcement Learning☆148Updated 2 months ago
- This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.☆104Updated last year
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆43Updated 2 months ago
- [ACL 2025] How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training☆45Updated 3 months ago
- [ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework☆71Updated 5 months ago
- Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models. TMLR 2025.☆118Updated last month