ScalingIntelligence / ArchonLinks

Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.

☆187

Alternatives and similar repositories for Archon

Users that are interested in Archon are comparing it to the libraries listed below

Sorting:

SalesforceAIResearch / LaTRO
☆122Updated 8 months ago
letta-ai / sleep-time-compute
accompanying material for sleep-time compute paper
☆117Updated 5 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆172Updated 9 months ago
data-for-agents / insta
Official Repo for InSTA: Towards Internet-Scale Training For Agents
☆56Updated 3 months ago
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆151Updated 8 months ago
PrimeIntellect-ai / genesys
☆135Updated 7 months ago
withmartian / routerbench
The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System
☆145Updated last year
facebookresearch / meta-agents-research-environments
Meta Agents Research Environments is a comprehensive platform designed to evaluate AI agents in dynamic, realistic scenarios. Unlike stat…
☆321Updated last week
R2E-Gym / R2E-Gym
[COLM 2025] Official repository for R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents
☆176Updated 3 months ago
ScalingIntelligence / codemonkeys
☆58Updated 8 months ago
LeonGuertler / TextArena
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆291Updated 2 weeks ago
ServiceNow / PipelineRL
A scalable asynchronous reinforcement learning implementation with in-flight weight updates.
☆260Updated this week
METR / RE-Bench
☆113Updated last week
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆89Updated last year
felipemaiapolo / tinyBenchmarks
Evaluating LLMs with fewer examples
☆163Updated last year
OSU-NLP-Group / GrokkedTransformer
Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'
☆233Updated 3 months ago
LeonGuertler / UnstableBaselines
☆105Updated this week
Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆107Updated 3 months ago
suzgunmirac / dynamic-cheatsheet
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
☆143Updated 5 months ago
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆79Updated 7 months ago
WildEval / ZeroEval
A simple unified framework for evaluating LLMs
☆251Updated 6 months ago
SWE-bench / SWE-smith
[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents
☆432Updated this week
RulinShao / retrieval-scaling
Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".
☆216Updated this week
AlexCuadron / ThinkingAgent
Systematic evaluation framework that automatically rates overthinking behavior in large language models.
☆93Updated 5 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆108Updated 7 months ago
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆110Updated 6 months ago
MLE-Dojo / MLE-Dojo
☆76Updated last month
HazyResearch / cartridges
Storing long contexts in tiny caches with self-study
☆201Updated last week
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆163Updated 6 months ago
aorwall / moatless-tree-search
☆117Updated 4 months ago