openai / preparednessLinks

Releases from OpenAI Preparedness

☆815

Alternatives and similar repositories for preparedness

Users that are interested in preparedness are comparing it to the libraries listed below

Sorting:

facebookresearch / swe-rl
Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"
☆573Updated 4 months ago
SWE-Gym / SWE-Gym
Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]
☆516Updated last week
openai / mle-bench
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
☆823Updated last month
microsoft / rStar
☆608Updated 3 weeks ago
qixucen / atom
Atom of Thoughts for Markov LLM Test-Time Scaling
☆580Updated last month
facebookresearch / coconut
Training Large Language Model to Reason in a Continuous Latent Space
☆1,224Updated 6 months ago
facebookresearch / MLGym
MLGym A New Framework and Benchmark for Advancing AI Research Agents
☆538Updated 2 weeks ago
openai / SWELancer-Benchmark
This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software E…
☆1,433Updated 3 weeks ago
seal-rg / recurrent-pretraining
Pretraining and inference code for a large-scale depth-recurrent language model
☆808Updated 3 weeks ago
CharlesQ9 / Alita
☆761Updated 2 months ago
TheAgentCompany / TheAgentCompany
An agent benchmark with tasks in a simulated software company.
☆515Updated last week
laude-institute / terminal-bench
A benchmark for LLMs on complicated tasks in the terminal
☆358Updated this week
GAIR-NLP / LIMO
[COLM 2025] LIMO: Less is More for Reasoning
☆993Updated last week
Continual-Intelligence / SEAL
Self-Adapting Language Models
☆743Updated last week
BytedTsinghua-SIA / MemAgent
A MemAgent framework that can be extrapolated to 3.5M, along with a training framework for RL training of any agent workflow.
☆588Updated last week
IntologyAI / Zochi
Repository for Zochi's Research
☆248Updated last month
SWE-bench / SWE-smith
Scaling Data for SWE-agents
☆342Updated this week
DreamLM / Dream
Dream 7B, a large diffusion language model
☆873Updated last month
open-thought / reasoning-gym
procedural reasoning datasets
☆1,012Updated this week
hkust-nlp / CodeIO
[ICML 2025 Oral] CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction
☆537Updated 3 months ago
NovaSky-AI / SkyRL
SkyRL: A Modular Full-stack RL Library for LLMs
☆698Updated this week
ServiceNow / AgentLab
AgentLab: An open-source framework for developing, testing, and benchmarking web agents on diverse tasks, designed for scalability and re…
☆373Updated this week
arcprize / arc-agi-benchmarking
Testing baseline LLMs performance across various models
☆293Updated 2 weeks ago
Agent-RL / ReCall
ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning
☆1,116Updated 2 months ago
SakanaAI / self-adaptive-llms
A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!
☆1,132Updated 6 months ago
ByteDance-Seed / Seed-Thinking-v1.5
☆802Updated 2 months ago
aw31 / openai-imo-2025-proofs
☆466Updated 3 weeks ago
google-deepmind / alphaevolve_results
☆213Updated last month
multi-agent-systems-failure-taxonomy / MAST
☆248Updated 2 weeks ago
sail-sg / understand-r1-zero
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,055Updated 2 weeks ago