benpry / why-think-step-by-stepLinks

Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"

☆62

Alternatives and similar repositories for why-think-step-by-step

Users that are interested in why-think-step-by-step are comparing it to the libraries listed below

Sorting:

sher222 / LeReT
Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
☆51Updated last year
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆151Updated 10 months ago
allenai / infinigram-api
☆87Updated this week
minosvasilias / simple_grpo
Simple GRPO scripts and configurations.
☆59Updated 10 months ago
Alex-Gurung / ReasoningNCP
Official repo for Learning to Reason for Long-Form Story Generation
☆72Updated 7 months ago
facebookresearch / llm-speedrunner
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆112Updated last month
StigLidu / DualDistill
[EMNLP 2025] The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"
☆100Updated 3 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆90Updated last year
HyperPotatoNeo / RSA
☆76Updated 2 months ago
du-nlp-lab / MLR-Copilot
☆67Updated 8 months ago
bespokelabsai / verifiers
Verifiers for LLM Reinforcement Learning
☆80Updated 7 months ago
letta-ai / sleep-time-compute
accompanying material for sleep-time compute paper
☆118Updated 7 months ago
oriyor / assistantbench
Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"
☆66Updated 11 months ago
ZeroSumEval / ZeroSumEval
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆34Updated 7 months ago
SalesforceAIResearch / LaTRO
☆124Updated 9 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 10 months ago
VITA-Group / ChainCoder
[ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …
☆43Updated 2 years ago
JacobPfau / fillerTokens
☆75Updated last year
allenai / CommonGen-Eval
Evaluating LLMs with CommonGen-Lite
☆93Updated last year
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆84Updated 8 months ago
likenneth / q_probe
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
☆41Updated last year
ahstat / episodic-memory-benchmark
Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…
☆60Updated 2 months ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
KaiNylund / lm-weights-encode-time
☆69Updated last year
google-deepmind / questbench
☆34Updated 6 months ago
felipemaiapolo / tinyBenchmarks
Evaluating LLMs with fewer examples
☆169Updated last year
JHU-CLSP / RATIONALYST
Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044
☆35Updated last year
yale-nlp / SciArena
Analysis code for Neurips 2025 paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"
☆55Updated 4 months ago
joshuacnf / Ctrl-G
☆105Updated 11 months ago
arcee-ai / DAM
☆55Updated last year