kvfrans / lmpoLinks
☆108Updated last week
Alternatives and similar repositories for lmpo
Users that are interested in lmpo are comparing it to the libraries listed below
Sorting:
- ☆101Updated this week
- rl from zero pretrain, can it be done? yes.☆268Updated last month
- Minimal but scalable implementation of large language models in JAX☆35Updated 2 weeks ago
- Benchmarking Agentic LLM and VLM Reasoning On Games☆191Updated last month
- Cost aware hyperparameter tuning algorithm☆169Updated last year
- Training-Ready RL Environments + Evals☆90Updated last week
- A Gym for Generalist LLMs☆122Updated this week
- Learn online intrinsic rewards from LLM feedback☆43Updated 9 months ago
- Synchronized Curriculum Learning for RL Agents☆113Updated 3 weeks ago
- A simple, performant and scalable JAX-based world modeling codebase☆73Updated last week
- OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code (ICLR 2025).☆67Updated 8 months ago
- XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning - - — ICLR 2025☆78Updated 7 months ago
- Compiling useful links, papers, benchmarks, ideas, etc.☆45Updated 6 months ago
- 📄Small Batch Size Training for Language Models☆60Updated 3 weeks ago
- Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models☆64Updated 6 months ago
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆96Updated 6 months ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆96Updated last month
- Minimal yet performant LLM examples in pure JAX☆158Updated last week
- ☆217Updated 7 months ago
- A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning☆276Updated this week
- Efficient baselines for autocurricula in JAX.☆196Updated last year
- NanoGPT-speedrunning for the poor T4 enjoyers☆71Updated 4 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆119Updated 4 months ago
- Dion optimizer algorithm☆343Updated 2 weeks ago
- Official repository of the spotlight ICML 2025 paper, PokeChamp: an Expert-level Minimax Language Agent.☆105Updated last month
- Flax (Jax) implementation of DeepSeek-R1-Distill-Qwen-1.5B with weights ported from Hugging Face.☆22Updated 7 months ago
- Efficiently discovering algorithms via LLMs with evolutionary search and reinforcement learning.☆110Updated last month
- ☆187Updated last month
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆62Updated 4 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆160Updated 2 months ago