long-horizon-execution / measuring-executionLinks
☆48Updated 2 months ago
Alternatives and similar repositories for measuring-execution
Users that are interested in measuring-execution are comparing it to the libraries listed below
Sorting:
- Esoteric Language Models☆108Updated 2 weeks ago
- Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More☆33Updated 6 months ago
- [EMNLP'25 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"☆67Updated 8 months ago
- The official github repo for "Diffusion Language Models are Super Data Learners".☆208Updated last month
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆54Updated last month
- ☆29Updated last month
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆136Updated 4 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆84Updated 8 months ago
- Official repo of paper LM2☆46Updated 10 months ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆224Updated last month
- This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"☆272Updated 2 weeks ago
- ☆52Updated 5 months ago
- [EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆100Updated 3 months ago
- [NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)☆165Updated 2 weeks ago
- [ACL 2025] How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training☆45Updated 4 months ago
- [Preprint] RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments☆154Updated 3 weeks ago
- ☆41Updated 6 months ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆115Updated 6 months ago
- ☆105Updated 3 months ago
- Code for paper called Self-Training Elicits Concise Reasoning in Large Language Models☆42Updated 7 months ago
- ☆89Updated last year
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆112Updated last year
- JudgeLRM: Large Reasoning Models as a Judge☆40Updated this week
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆40Updated last month
- ☆38Updated last year
- SSRL: Self-Search Reinforcement Learning☆158Updated 3 months ago
- ☆342Updated last month
- Geometric-Mean Policy Optimization☆95Updated 3 weeks ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆94Updated 6 months ago
- [ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications☆51Updated last month