kvfrans / lmpoLinks
☆113Updated last week
Alternatives and similar repositories for lmpo
Users that are interested in lmpo are comparing it to the libraries listed below
Sorting:
- ☆103Updated this week
- Minimal but scalable implementation of large language models in JAX☆35Updated last month
- A simple, performant and scalable JAX-based world modeling codebase☆76Updated this week
- RLP: Reinforcement as a Pretraining Objective☆155Updated this week
- Cost aware hyperparameter tuning algorithm☆171Updated last year
- Benchmarking Agentic LLM and VLM Reasoning On Games☆197Updated last month
- Official repository of the spotlight ICML 2025 paper, PokeChamp: an Expert-level Minimax Language Agent.☆108Updated last week
- Synchronized Curriculum Learning for RL Agents☆113Updated last month
- XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning - - — ICLR 2025☆80Updated 7 months ago
- This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"☆114Updated this week
- rl from zero pretrain, can it be done? yes.☆275Updated last week
- OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code (ICLR 2025).☆68Updated 9 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆164Updated 3 months ago
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆207Updated this week
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆98Updated last week
- 📄Small Batch Size Training for Language Models☆63Updated last week
- A Gym for Agentic LLMs☆233Updated this week
- ☆188Updated last month
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆283Updated this week
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆132Updated last year
- Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models☆65Updated 7 months ago
- Training-Ready RL Environments + Evals☆121Updated this week
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆99Updated this week
- ☆114Updated last month
- ☆222Updated last week
- Learn online intrinsic rewards from LLM feedback☆43Updated 9 months ago
- Efficient World Models with Context-Aware Tokenization. ICML 2024☆108Updated last year
- Flax (Jax) implementation of DeepSeek-R1-Distill-Qwen-1.5B with weights ported from Hugging Face.☆21Updated 7 months ago
- Open source interpretability artefacts for R1.☆161Updated 5 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆72Updated 5 months ago