google-deepmind / questbenchLinks

☆32

Alternatives and similar repositories for questbench

Users that are interested in questbench are comparing it to the libraries listed below

Sorting:

Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆112Updated 3 months ago
SalesforceAIResearch / LaTRO
☆124Updated 9 months ago
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆151Updated 9 months ago
shenao-zhang / SELM
The official implementation of Self-Exploring Language Models (SELM)
☆63Updated last year
sunblaze-ucb / reasoning_ladder
☆35Updated 6 months ago
princeton-nlp / USACO
Can Language Models Solve Olympiad Programming?
☆122Updated 10 months ago
MLE-Dojo / MLE-Dojo
☆78Updated 3 weeks ago
THU-KEG / Agentic-Reward-Modeling
[ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
☆112Updated 5 months ago
letta-ai / sleep-time-compute
accompanying material for sleep-time compute paper
☆117Updated 6 months ago
data-for-agents / insta
Official Repo for InSTA: Towards Internet-Scale Training For Agents
☆56Updated 4 months ago
WindyLee0822 / Process_Q_Model
official implementation of paper "Process Reward Model with Q-value Rankings"
☆65Updated 9 months ago
codezakh / DataEnvGym
A testbed for agents and environments that can automatically improve models through data generation.
☆27Updated 8 months ago
jwhj / OREO
☆117Updated 10 months ago
allenai / super-benchmark
☆49Updated 7 months ago
architsharma97 / dpo-rlaif
☆100Updated last year
GAIR-NLP / scaleeval
Scalable Meta-Evaluation of LLMs as Evaluators
☆42Updated last year
sail-sg / VeriFree
Reinforcing General Reasoning without Verifiers
☆92Updated 5 months ago
OSU-NLP-Group / reversal-curse-binding
☆24Updated 7 months ago
sher222 / LeReT
Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
☆51Updated last year
convergence-ai / lm2
Official repo of paper LM2
☆46Updated 9 months ago
hughbzhang / o1_inference_scaling_laws
Replicating O1 inference-time scaling laws
☆90Updated 11 months ago
JHU-CLSP / RATIONALYST
Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044
☆35Updated last year
zitian-gao / SC-MCTS
Interpretable Contrastive Monte Carlo Tree Search Reasoning
☆48Updated last year
hamishivi / EasyLM
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…
☆75Updated last year
mukhal / ThinkPRM
Process Reward Models That Think
☆63Updated last month
ZonglinY / MOOSE
[ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …
☆42Updated last year
google-deepmind / bbeh
☆102Updated 6 months ago
JacobPfau / fillerTokens
☆75Updated last year
kilian-group / phantom-wiki
Python package for generating datasets to evaluate reasoning and retrieval of large language models
☆19Updated 2 months ago
probabilistic-inference-scaling / probabilistic-inference-scaling
☆52Updated 8 months ago