long-horizon-execution / measuring-executionLinks

☆46

Alternatives and similar repositories for measuring-execution

Users that are interested in measuring-execution are comparing it to the libraries listed below

Sorting:

convergence-ai / lm2
Official repo of paper LM2
☆46Updated 9 months ago
OpenMOSS / Lorsa
☆29Updated 2 weeks ago
aakaran / reasoning-with-sampling
☆317Updated 2 weeks ago
s-sahoo / Eso-LMs
Esoteric Language Models
☆106Updated last month
JinjieNi / dlms-are-super-data-learners
The official github repo for "Diffusion Language Models are Super Data Learners".
☆200Updated 2 weeks ago
Zhiyuan-Zeng / RLVE
[Preprint] RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments
☆134Updated last week
StigLidu / DualDistill
[EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"
☆100Updated 2 months ago
VsonicV / es-fine-tuning-paper
This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"
☆255Updated last week
efficientscaling / Z1
[EMNLP'25 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"
☆66Updated 7 months ago
MNoorFawi / curlora
The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.
☆52Updated last year
rbalestr-lab / llm-jepa
☆130Updated last month
metal-chart-generation / metal
☆40Updated 5 months ago
sail-sg / SkyLadder
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆40Updated last month
shangshang-wang / Resa
Resa: Transparent Reasoning Models via SAEs
☆44Updated 2 months ago
royeisen / reasoning_loading_bar
☆52Updated 4 months ago
complex-reasoning / RPG
Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)
☆54Updated last month
katiekang1998 / reasoning_generalization
☆33Updated 10 months ago
scitix / MEAP
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More
☆33Updated 6 months ago
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆82Updated 8 months ago
TergelMunkhbat / concise-reasoning
Code for paper called Self-Training Elicits Concise Reasoning in Large Language Models
☆42Updated 7 months ago
tiiuae / Falcon-H1
All information and news with respect to Falcon-H1 series
☆93Updated last month
ZihanWang314 / CoE
Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models
☆223Updated 2 weeks ago
zhengkid / Parallel-R1
The offical repo for "Parallel-R1: Towards Parallel Thinking via Reinforcement Learning"
☆233Updated last week
zjunlp / DynamicKnowledgeCircuits
[ACL 2025] How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
☆44Updated 4 months ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
rohinmanvi / Capability-Aware-and-Mid-Generation-Self-Evaluations
☆21Updated 3 months ago
ZihanWang314 / coeCheck
☆19Updated 8 months ago
BKHMSI / mixture-of-cognitive-reasoners
Mixture of Cognitive Reasoners: Modular Reasoning with Brain-Like Specialization
☆35Updated last month
xufangzhi / Genius
[ACL 2025] A Generalizable and Purely Unsupervised Self-Training Framework
☆70Updated 5 months ago
JayZhang42 / SLED
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433
☆110Updated 11 months ago