ch33nchan / RLlama
☆13Updated last week
Alternatives and similar repositories for RLlama
Users that are interested in RLlama are comparing it to the libraries listed below
Sorting:
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆98Updated 2 months ago
- Simple repository for training small reasoning models☆27Updated 3 months ago
- Train transformer language models with reinforcement learning.☆18Updated 2 months ago
- Compiling useful links, papers, benchmarks, ideas, etc.☆46Updated 2 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆65Updated 3 weeks ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆53Updated 3 months ago
- ☆27Updated 10 months ago
- Lego for GRPO☆28Updated last month
- ☆19Updated 2 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆65Updated last month
- Train your own SOTA deductive reasoning model☆92Updated 2 months ago
- A simple MLX implementation for pretraining LLMs on Apple Silicon.☆75Updated 2 weeks ago
- ☆38Updated 9 months ago
- Simple GRPO scripts and configurations.☆58Updated 3 months ago
- Clean RL implementation using MLX☆30Updated last year
- ☆46Updated last month
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆37Updated last week
- An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.☆26Updated 10 months ago
- This repository contain the simple llama3 implementation in pure jax.☆63Updated 2 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆64Updated 6 months ago
- Open source interpretability artefacts for R1.☆109Updated 3 weeks ago
- ☆54Updated 3 months ago
- ☆30Updated last week
- lossily compress representation vectors using product quantization☆53Updated 3 weeks ago
- Optimized LLM inference for Apple Silicon using MLX.☆10Updated this week
- A framework for fine-tuning retrieval-augmented generation (RAG) systems.☆26Updated this week
- An introduction to LLM Sampling☆78Updated 5 months ago
- ☆125Updated last month
- ☆129Updated last month
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆31Updated 3 weeks ago