natolambert / rlhf-book
Textbook on reinforcement learning from human feedback
☆22Updated last month
Related projects: ⓘ
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆37Updated 3 months ago
- ☆23Updated 5 months ago
- Official Implementation of NeurIPS'23 Paper "Cross-Episodic Curriculum for Transformer Agents"☆30Updated 11 months ago
- Clean RL implementation using MLX☆26Updated 6 months ago
- Repo to reproduce the First-Explore paper results☆36Updated last year
- ☆17Updated 3 months ago
- Implementation of Soft Actor Critic and some of its improvements in Pytorch☆30Updated this week
- Triton Implementation of HyperAttention Algorithm☆46Updated 9 months ago
- ☆17Updated 4 months ago
- ☆25Updated 5 months ago
- ☆29Updated 2 weeks ago
- Generative cellular automaton-like learning environments for RL.☆19Updated last month
- Make triton easier☆39Updated 3 months ago
- Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models☆41Updated 3 months ago
- Jax like function transformation engine but micro, microjax☆24Updated 3 weeks ago
- PyTorch Package For Quasimetric Learning☆38Updated last year
- ☆33Updated last year
- Code for the paper "Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making"☆20Updated 2 months ago
- ☆28Updated last week
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆29Updated 3 weeks ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆23Updated last year
- OMNI-EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code☆20Updated 3 weeks ago
- Efficient World Models with Context-Aware Tokenization. ICML 2024☆73Updated 2 months ago
- The official implementation of the paper "Read to Play (R2-Play): Decision Transformer with Multimodal Game Instruction".☆32Updated 7 months ago
- ☆40Updated 4 months ago
- MACTA: A Multi-agent Reinforcement Learning Approach for Cache Timing Attacks and Detection☆45Updated last year
- ☆11Updated 2 months ago
- ☆38Updated 8 months ago
- 📜 [ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswa…☆36Updated 10 months ago
- ☆27Updated this week