open-thought/reasoning-gym-eval

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/open-thought/reasoning-gym-eval)

open-thought / reasoning-gym-eval

Collection of LLM completions for reasoning-gym task datasets

☆31

Alternatives and similar repositories for reasoning-gym-eval

Users that are interested in reasoning-gym-eval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zafstojano / policy-gradients
View on GitHub
A minimal hackable implementation of policy gradient methods (GRPO, PPO, REINFORCE)
☆16Feb 20, 2026Updated 5 months ago
open-thought / reasoning-gym
View on GitHub
[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards
☆1,469Apr 17, 2026Updated 3 months ago
JeanKaddour / tpo
View on GitHub
Target Policy Optimization (JAX)
☆30Apr 18, 2026Updated 3 months ago
XuchanBao / behavioral-self-awareness
View on GitHub
☆37Feb 20, 2025Updated last year
hppRC / llm-translator
View on GitHub
Mixtral-based Ja-En (En-Ja) Translation model
☆20Jan 6, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
MohamedOsman1998 / deep-learning-for-arc
View on GitHub
☆15Jun 19, 2025Updated last year
unconst / boltzmann
View on GitHub
Incentivized Training over Wide Web with 1000x model compression.
☆22Oct 30, 2024Updated last year
seanmacavaney / plaidrepro
View on GitHub
☆11Feb 9, 2024Updated 2 years ago
wrmedford / moe-scaling
View on GitHub
Scaling Laws for Mixture of Experts Models
☆15Feb 25, 2025Updated last year
zafstojano / wordgamebench
View on GitHub
Evaluating language models on word puzzle games
☆10Oct 25, 2024Updated last year
yihong-chen / ReFactorGNN
View on GitHub
Implementation for ReFactor GNNs
☆15Jun 10, 2025Updated last year
VisuLogic-Benchmark / VisuLogic-Train
View on GitHub
☆21Jul 9, 2025Updated last year
Lux0926 / ASPRM
View on GitHub
AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence
☆10Mar 2, 2025Updated last year
ALT-JS / OthelloSAE
View on GitHub
CS194-196 Course Project
☆14Feb 20, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
cpldcpu / llmbenchmark
View on GitHub
Various LLM Benchmarks
☆26Feb 20, 2026Updated 5 months ago
Oxen-AI / Self-Rewarding-Language-Models
View on GitHub
This is work done by the Oxen.ai Community, trying to reproduce the Self-Rewarding Language Model paper from MetaAI.
☆135Nov 16, 2024Updated last year
one-covenant / grail
View on GitHub
interplanetary intelligence
☆25Apr 10, 2026Updated 3 months ago
jataware / XRR2
View on GitHub
Expand -> Retrieve -> Rerank - simple method with strong results on BRIGHT benchmark
☆22Aug 22, 2025Updated 11 months ago
Zayne-sprague / To-CoT-or-not-to-CoT
View on GitHub
☆26Apr 10, 2025Updated last year
google-deepmind / egg
View on GitHub
☆19Apr 15, 2026Updated 3 months ago
PrimeIntellect-ai / prime-iroh
View on GitHub
Asynchronous P2P communication backend for decentralized pipeline parallelism
☆46Updated this week
muellerzr / smol-moe
View on GitHub
☆25Oct 10, 2025Updated 9 months ago
icip-cas / AutoAlign
View on GitHub
A toolkit for automated alignment research.
☆15Jul 3, 2026Updated 3 weeks ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
jaryaman / causal-inference-cheat-sheet
View on GitHub
Summary of useful results in Causal Inference
☆20May 8, 2021Updated 5 years ago
PrimeIntellect-ai / lab-cookbook
View on GitHub
Lab Cookbook
☆38Updated this week
IgorWounds / Backtester101
View on GitHub
A proof-of-concept custom backtester
☆23Apr 3, 2024Updated 2 years ago
google-deepmind / questbench
View on GitHub
QuestBench: Can LLMs ask the right question to acquire information in reasoning tasks?
☆43Jun 30, 2026Updated 3 weeks ago
naoya-i / r4c
View on GitHub
r4c
☆14Mar 2, 2021Updated 5 years ago
Strong-AI-Lab / Logical-and-abstract-reasoning
View on GitHub
Evaluation on Logical Reasoning and Abstract Reasoning Challenges
☆30Apr 21, 2025Updated last year
FloyedShen / VESPO
View on GitHub
☆34Feb 12, 2026Updated 5 months ago
AndonLabs / multiagent-inspect
View on GitHub
☆24Feb 5, 2025Updated last year
bugbytes-io / fastui-sqlmodel-demo
View on GitHub
Demo that extends the FastUI example & adds database persistence
☆17Jan 2, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
vectara / FaithJudge
View on GitHub
☆19Nov 11, 2025Updated 8 months ago
wrmedford / llm720
View on GitHub
Second Generation of Large Language Models
☆21Jun 30, 2025Updated last year
rosieyzh / openrlhf-pretrain
View on GitHub
Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"
☆29Oct 14, 2025Updated 9 months ago
nishadsinghi / sc-genrm-scaling
View on GitHub
[COLM 2025] Official code for "When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoni…
☆15Oct 31, 2025Updated 8 months ago
Ronsor / llama-tools
View on GitHub
Tools for the LLaMA language model
☆12Apr 4, 2023Updated 3 years ago
ltgoslo / bert-in-context
View on GitHub
Official implementation of "BERTs are Generative In-Context Learners"
☆32Mar 14, 2025Updated last year
CarperAI / nmmo-environment
View on GitHub
Neural MMO - A Massively Multiagent Environment for Artificial Intelligence Research
☆15May 30, 2024Updated 2 years ago