ch33nchan / RLlamaLinks
☆15Updated this week
Alternatives and similar repositories for RLlama
Users that are interested in RLlama are comparing it to the libraries listed below
Sorting:
- Simple repository for training small reasoning models☆31Updated 4 months ago
- Compiling useful links, papers, benchmarks, ideas, etc.☆46Updated 2 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆100Updated 3 months ago
- ☆36Updated 2 weeks ago
- ☆27Updated 10 months ago
- rl from zero pretrain, can it be done? we'll see.☆24Updated this week
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆64Updated 7 months ago
- Lego for GRPO☆28Updated last week
- Train your own SOTA deductive reasoning model☆93Updated 3 months ago
- This repository contain the simple llama3 implementation in pure jax.☆64Updated 3 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆66Updated last month
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆58Updated 2 weeks ago
- Train transformer language models with reinforcement learning.☆19Updated 3 months ago
- ☆46Updated 2 months ago
- A simple MLX implementation for pretraining LLMs on Apple Silicon.☆76Updated last month
- coding CUDA everyday!☆33Updated last month
- ☆59Updated 2 weeks ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆67Updated 2 months ago
- look how they massacred my boy☆63Updated 7 months ago
- ☆20Updated 2 months ago
- coloring terminal text with intensities (used for plotting probability, entropy with tokens)☆12Updated 7 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆38Updated last month
- A repository to unravel the language of GPUs, making their kernel conversations easy to understand☆185Updated this week
- An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.☆29Updated 11 months ago
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆86Updated 2 months ago
- Open source interpretability artefacts for R1.☆140Updated last month
- PyTorch implementations of algorithms from "Reinforcement Learning: An Introduction by Sutton and Barto", along with various RL research …☆139Updated last week
- BricksRL: A Platform for Democratizing Robotics and Reinforcement Learning Research and Education with LEGO☆58Updated 7 months ago
- minimal GRPO implementation from scratch☆90Updated 2 months ago
- ☆86Updated 2 weeks ago