SakanaAI / ALE-BenchLinks

The official repository of ALE-Bench

☆107

Alternatives and similar repositories for ALE-Bench

Users that are interested in ALE-Bench are comparing it to the libraries listed below

Sorting:

SakanaAI / RLT
Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.
☆324Updated last month
SakanaAI / Sudoku-Bench
An AI benchmark for creative, human-like problem solving using Sudoku variants
☆84Updated last week
luchris429 / DiscoPOP
Code for Discovering Preference Optimization Algorithms with and for Large Language Models
☆63Updated last year
vsubramaniam851 / multiagent-ft
☆212Updated 5 months ago
goncalorafaria / qalign
QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.
☆23Updated 4 months ago
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆103Updated 3 months ago
facebookresearch / llm-speedrunner
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆94Updated last week
SakanaAI / DiscoPOP
Code for Discovering Preference Optimization Algorithms with and for Large Language Models
☆189Updated last year
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆175Updated 5 months ago
SalesforceAIResearch / LaTRO
☆118Updated 5 months ago
google-deepmind / questbench
☆25Updated 2 months ago
benchflow-ai / pokemon-gym
☆82Updated last month
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 6 months ago
facebookresearch / collaborative-reasoner
Source code for the collaborative reasoner research project at Meta FAIR.
☆99Updated 3 months ago
SakanaAI / CycleQD
CycleQD is a framework for parameter space model merging.
☆42Updated 6 months ago
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆149Updated 6 months ago
facebookresearch / MemoryMosaics
Memory Mosaics are networks of associative memories working in concert to achieve a prediction task.
☆47Updated 6 months ago
THU-KEG / Agentic-Reward-Modeling
[ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
☆99Updated last month
jwhj / OREO
☆114Updated 6 months ago
SakanaAI / ab-mcts-arc2
☆91Updated last month
flowersteam / SOAR
Implementation of SOAR
☆37Updated this week
complex-reasoning / RPG
The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)
☆35Updated last week
letta-ai / sleep-time-compute
accompanying material for sleep-time compute paper
☆99Updated 3 months ago
sunblaze-ucb / Intuitor
Code for the paper: "Learning to Reason without External Rewards"
☆344Updated 3 weeks ago
eqimp / hogwild_llm
Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache
☆116Updated 3 weeks ago
sher222 / LeReT
Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
☆49Updated 9 months ago
data-for-agents / insta
Official Repo for InSTA: Towards Internet-Scale Training For Agents
☆52Updated 3 weeks ago
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆72Updated 4 months ago
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆283Updated this week
Arvid-pku / Godel_Agent
Gödel Agent: A Self-Referential Agent Framework for Recursive Self-Improvement
☆118Updated 5 months ago