Laz4rz / RL
☆16Updated 3 months ago
Alternatives and similar repositories for RL
Users that are interested in RL are comparing it to the libraries listed below
Sorting:
- ☆46Updated last month
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆64Updated 6 months ago
- ☆41Updated 4 months ago
- Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.☆29Updated 3 weeks ago
- Compiling useful links, papers, benchmarks, ideas, etc.☆46Updated 2 months ago
- Lego for GRPO☆28Updated last month
- aesthetic tensor visualiser☆20Updated 3 weeks ago
- Fine tune Gemma 3 on an object detection task☆20Updated this week
- ☆38Updated 9 months ago
- A simple MLX implementation for pretraining LLMs on Apple Silicon.☆75Updated 2 weeks ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆65Updated 3 weeks ago
- lossily compress representation vectors using product quantization☆53Updated 3 weeks ago
- An introduction to LLM Sampling☆78Updated 5 months ago
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆28Updated this week
- So, I trained a Llama a 130M architecture I coded from ground up to build a small instruct model from scratch. Trained on FineWeb dataset…☆14Updated last month
- Simple repository for training small reasoning models☆27Updated 3 months ago
- Extensive introductory writeup on Zig language functionalities☆10Updated 10 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆98Updated 2 months ago
- ☆27Updated 10 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆65Updated last month
- Modify Entropy Based Sampling to work with Mac Silicon via MLX☆50Updated 6 months ago
- ☆93Updated 7 months ago
- Simple GRPO scripts and configurations.☆58Updated 3 months ago
- look how they massacred my boy☆63Updated 7 months ago
- Let's make all Machine learning algorithms from scratch!☆13Updated 10 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.