willccbb / trl
Train transformer language models with reinforcement learning.
☆18Updated 2 months ago
Alternatives and similar repositories for trl
Users that are interested in trl are comparing it to the libraries listed below
Sorting:
- ☆65Updated 2 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆65Updated last month
- ☆50Updated 5 months ago
- look how they massacred my boy☆63Updated 7 months ago
- MLX port for xjdr's entropix sampler (mimics jax implementation)☆64Updated 6 months ago
- An overview of GRPO & DeepSeek-R1 Training with Open Source GRPO Model Fine Tuning☆31Updated 3 months ago
- ☆44Updated this week
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆53Updated 3 months ago
- Simple examples using Argilla tools to build AI☆52Updated 5 months ago
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆22Updated last month
- One click away from a locally downloaded, fine-tuned model, hosted on hugging face, with inference built in. In two hours.☆21Updated 2 months ago
- An example implementation of RLHF (or, more accurately, RLAIF) built on MLX and HuggingFace.☆26Updated 10 months ago
- Lego for GRPO☆28Updated last month
- A pure MLX-based training pipeline for fine-tuning LLMs using GRPO on Apple Silicon.☆38Updated 3 months ago
- LLM reads a paper and produce a working prototype☆56Updated last month
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆77Updated last month
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆90Updated 3 months ago
- Train your own SOTA deductive reasoning model☆92Updated 2 months ago
- ☆29Updated last year
- Personal project, Generative AI, Streamlit, Python☆52Updated 2 weeks ago
- Simple GRPO scripts and configurations.☆58Updated 3 months ago
- ☆54Updated 3 months ago
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆37Updated last week
- This codebase demonstrates various DSPy functionalities through practical examples.☆41Updated 3 months ago
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)☆110Updated 3 months ago
- A simple MLX implementation for pretraining LLMs on Apple Silicon.☆75Updated 2 weeks ago
- A Python library to orchestrate LLMs in a neural network-inspired structure☆47Updated 7 months ago
- Build a Recommendation System Agent using LATS Agent Approach☆29Updated 2 months ago
- AI agent with RAG+ReAct on Indian Constitution & BNS☆64Updated 6 months ago
- A list of AI memory projects☆102Updated 4 months ago