kvfrans / lmpoLinks
☆128Updated 2 weeks ago
Alternatives and similar repositories for lmpo
Users that are interested in lmpo are comparing it to the libraries listed below
Sorting:
- A simple, performant and scalable JAX-based world modeling codebase☆113Updated last month
- Benchmarking Agentic LLM and VLM Reasoning On Games☆207Updated last week
- ☆107Updated last week
- This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"☆272Updated 2 weeks ago
- Minimal but scalable implementation of large language models in JAX☆35Updated 2 weeks ago
- RLP: Reinforcement as a Pretraining Objective☆205Updated 2 months ago
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆112Updated 2 months ago
- rl from zero pretrain, can it be done? yes.☆282Updated 2 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆73Updated 7 months ago
- Synchronized Curriculum Learning for RL Agents☆116Updated last month
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆106Updated 2 weeks ago
- Flax (Jax) implementation of DeepSeek-R1-Distill-Qwen-1.5B with weights ported from Hugging Face.☆26Updated 9 months ago
- Cost aware hyperparameter tuning algorithm☆176Updated last year
- Implementation of the new SOTA for model based RL, from the paper "Improving Transformer World Models for Data-Efficient RL", in Pytorch☆145Updated 7 months ago
- 📄Small Batch Size Training for Language Models☆68Updated 2 months ago
- Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models☆65Updated 9 months ago
- ☆463Updated 3 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆174Updated 5 months ago
- OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code (ICLR 2025).☆71Updated 11 months ago
- Supporting code for the blog post on modular manifolds.☆104Updated 2 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆133Updated 7 months ago
- XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning - - — ICLR 2025☆81Updated 9 months ago
- Learn online intrinsic rewards from LLM feedback☆45Updated 11 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆132Updated last year
- minimal Energy-based transformer☆41Updated last month
- Minimal yet performant LLM examples in pure JAX☆207Updated last week
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆322Updated this week
- ☆224Updated 2 weeks ago
- Efficient World Models with Context-Aware Tokenization. ICML 2024☆114Updated last year
- Training-Ready RL Environments + Evals☆185Updated last week