sethkarten/continual-harness

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sethkarten/continual-harness)

sethkarten / continual-harness

Official repository of the paper: Continual Harness: Online Adaptation for Self-Improving Foundation Agents and PokeAgent Speedrun Track 2

☆251

Alternatives and similar repositories for continual-harness

Users that are interested in continual-harness are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sethkarten / pokechamp
View on GitHub
Official repository of the spotlight ICML 2025 paper, PokeChamp: an Expert-level Minimax Language Agent.
☆175Mar 11, 2026Updated 4 months ago
feng-rrRay / Continual-Harness-ARC-AGI-3
View on GitHub
Official implementation of Continual Harness (arxiv.org/abs/2605.09998) on ARC-AGI-3
☆33Jul 3, 2026Updated 2 weeks ago
UT-Austin-RPL / metamon
View on GitHub
Pokémon Showdown RL Agents and Datasets
☆119Jun 19, 2026Updated last month
WujiangXu / MemGym
View on GitHub
The code for paper "MemGym: a Long-Horizon Memory Environment for LLM Agents".
☆18Jun 2, 2026Updated last month
THU-KEG / LongTraceRL
View on GitHub
LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards
☆38Jun 1, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
OpenWebRL / OpenWebRL
View on GitHub
Code for paper OpenWebRL: Online Multi-Turn Reinforcement Learning for Visual Web Agents
☆37Jul 9, 2026Updated last week
dvruette / pokemon-emerald-experiments
View on GitHub
Playing Pokemon Red with Reinforcement Learning
☆21Jul 28, 2025Updated 11 months ago
waynchi / gamedevbench
View on GitHub
☆86Updated this week
cameronangliss / vgc-bench
View on GitHub
An AI benchmark for Pokémon VGC with agent implementations using multi-agent reinforcement learning, behavior cloning, LLMs, and heuristi…
☆46Jul 4, 2026Updated 2 weeks ago
OPTML-Group / Unlearn-Smooth
View on GitHub
[ICML25] Official repo for "Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond…
☆24Sep 27, 2025Updated 9 months ago
stanford-iris-lab / meta-harness-tbench2-artifact
View on GitHub
Meta-Harness: 76.4% on Terminal-Bench 2.0 (Claude Opus 4.6)
☆1,149Mar 26, 2026Updated 3 months ago
benchflow-ai / pokemon-gym
View on GitHub
☆96Jun 30, 2025Updated last year
Embodied-Minds-Lab / BES
View on GitHub
We propose Bidirectional Evolutionary Search (BES), a search framework that couples forward candidate evolution with backward goal decomp…
☆166May 28, 2026Updated last month
vinid / einstein-arena
View on GitHub
☆37Updated this week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
liushulinle / MarsRL
View on GitHub
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism
☆18Nov 18, 2025Updated 8 months ago
lucidrains / populora
View on GitHub
Implementation and explorations into PopuLoRA, Co-Evolving LLM Populations for Reasoning Self-Play
☆15Jun 3, 2026Updated last month
Model-GLUE / Model-GLUE
View on GitHub
☆18Aug 19, 2024Updated last year
EIT-NLP / Connector-Selection-for-MLLM
View on GitHub
[EMNLP 2024 Main] Official implementation of the paper "To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimoda…
☆17Dec 13, 2024Updated last year
mandyyyyii / east
View on GitHub
☆19Aug 4, 2025Updated 11 months ago
A-EVO-Lab / a-evolve
View on GitHub
The official repository of "Position: Agentic Evolution is the Path to Evolving LLMs".
☆700Jun 29, 2026Updated 3 weeks ago
zircote / rlm-rs
View on GitHub
Rust CLI implementing the Recursive Language Model (RLM) pattern for Claude Code. Process documents 100x larger than context windows thro…
☆57Updated this week
ruixin31 / Spurious_Rewards
View on GitHub
☆361Jul 29, 2025Updated 11 months ago
WooooDyy / AgentGym-RL
View on GitHub
Code and implementations for the paper "AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcemen…
☆816Feb 15, 2026Updated 5 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
eliaka / repeatedgames
View on GitHub
☆24Sep 23, 2024Updated last year
SalesforceAIResearch / socratic-self-refine-reasoning
View on GitHub
☆27Jun 2, 2026Updated last month
bingreeky / opd-evolver
View on GitHub
☆37Jun 17, 2026Updated last month
ASTRAL-Group / MonitorBench
View on GitHub
[COLM 2026] Official implementation for "MonitorBench: A Comprehensive Benchmark for Chain-of-Thought Monitorability in Large Language Mo…
☆20Apr 23, 2026Updated 2 months ago
open-thoughts / OpenThoughts-Agent
View on GitHub
Data recipes and robust infrastructure for training AI agents
☆260Updated this week
tmlr-group / G-effect
View on GitHub
[ICLR 2025] "Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond"
☆16Feb 27, 2025Updated last year
OPTML-Group / Unlearn-WorstCase
View on GitHub
[ECCV24] "Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning" by Chongyu Fan*, Jiancheng Liu*, Alfred Hero, …
☆28May 27, 2025Updated last year
SeokwonJung-Jay / MEME-public
View on GitHub
MEME: Multi-Entity & Evolving Memory Evaluation — reference implementation (companion to arXiv preprint)
☆22May 11, 2026Updated 2 months ago
ankit-vaidya19 / Share
View on GitHub
The Official PyTorch implementation of Shared LoRA Subspaces for almost Strict Continual Learning
☆33May 7, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
RecursiveMAS / RecursiveMAS
View on GitHub
Offical Implementation for "Recursive Multi-Agent Systems"
☆897Jun 29, 2026Updated 3 weeks ago
bingreeky / MemEvolve
View on GitHub
[ICML'26] MemEvolve & EvolveLab
☆255May 5, 2026Updated 2 months ago
Goekdeniz-Guelmez / mlx-embeddings-lora
View on GitHub
Train Embedding Models on MLX.
☆17Jun 2, 2026Updated last month
meituan / MemOCR
View on GitHub
MemOCR: an OCR-driven visual memory agent.
☆33May 17, 2026Updated 2 months ago
stanford-iris-lab / meta-harness
View on GitHub
Reference code for the Meta-Harness paper.
☆1,310Jul 11, 2026Updated last week
ahnjaewoo / FlashAdventure
View on GitHub
🕵 Code for our EMNLP 2025 Main paper: "FlashAdventure: A Benchmark for GUI Agents Solving Full Story Arcs in Diverse Adventure Games"
☆26Apr 26, 2026Updated 2 months ago
amodaresi / MemLLM
View on GitHub
☆13Aug 13, 2024Updated last year