kvfrans / lmpoLinks

☆83

Alternatives and similar repositories for lmpo

Users that are interested in lmpo are comparing it to the libraries listed below

Sorting:

conglu1997 / intelligent-go-explore
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
☆60Updated 4 months ago
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated last week
facebookresearch / oni
Learn online intrinsic rewards from LLM feedback
☆41Updated 6 months ago
mnoukhov / async_rlhf
Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models
☆59Updated 2 months ago
clement-bonnet / lpn
Latent Program Network (from the "Searching Latent Program Spaces" paper)
☆91Updated 4 months ago
LeonGuertler / UnstableBaselines
☆90Updated this week
balrog-ai / BALROG
Benchmarking Agentic LLM and VLM Reasoning On Games
☆166Updated 2 months ago
tyler-romero / microR1
Simple repository for training small reasoning models
☆33Updated 5 months ago
imbue-ai / carbs
Cost aware hyperparameter tuning algorithm
☆162Updated last year
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆147Updated 2 weeks ago
maxencefaldor / omni-epic
OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code (ICLR 2025).
☆60Updated 6 months ago
CLAIRE-Labo / EvoTune
Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.
☆103Updated this week
facebookresearch / llm-speedrunner
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆87Updated 2 weeks ago
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆129Updated last year
google-deepmind / regress-lm
Library for text-to-text regression, applicable to any input string representation and allows pretraining and fine-tuning over multiple r…
☆86Updated this week
dunnolab / xland-minigrid-datasets
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning - - — ICLR 2025
☆75Updated 5 months ago
abdulhaim / LMRL-Gym
☆98Updated last year
vmicheli / delta-iris
Efficient World Models with Context-Aware Tokenization. ICML 2024
☆105Updated 9 months ago
lucidrains / improving-transformers-world-model-for-rl
Implementation of the new SOTA for model based RL, from the paper "Improving Transformer World Models for Data-Efficient RL", in Pytorch
☆128Updated 2 months ago
BatsResearch / planetarium
Dataset and benchmark for assessing LLMs in translating natural language descriptions of planning problems into PDDL
☆56Updated 9 months ago
Cornell-RL / tril
☆127Updated last year
ServiceNow / PipelineRL
A scalable asynchronous reinforcement learning implementation with in-flight weight updates.
☆129Updated this week
google-deepmind / asyncdiloco
☆45Updated last year
sethkarten / pokechamp
Official repository of the spotlight ICML 2025 paper, PokeChamp: an Expert-level Minimax Language Agent.
☆69Updated this week
complex-reasoning / RPG
The official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)
☆35Updated last week
spiral-rl / spiral
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning
☆103Updated last week
data-for-agents / insta
Official Repo for InSTA: Towards Internet-Scale Training For Agents
☆48Updated last week
facebookresearch / motif
Intrinsic Motivation from Artificial Intelligence Feedback
☆129Updated last year
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆173Updated 4 months ago
rail-berkeley / SUPE
This code accompanies the paper "Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration."
☆28Updated this week