willccbb / trlLinks
Train transformer language models with reinforcement learning.
☆19Updated 4 months ago
Alternatives and similar repositories for trl
Users that are interested in trl are comparing it to the libraries listed below
Sorting:
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆68Updated 3 months ago
- ☆63Updated last month
- An overview of GRPO & DeepSeek-R1 Training with Open Source GRPO Model Fine Tuning☆32Updated last month
- ☆69Updated 4 months ago
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆30Updated 2 months ago
- ☆50Updated 3 weeks ago
- Simple examples using Argilla tools to build AI☆53Updated 7 months ago
- ☆36Updated 4 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆53Updated 4 months ago
- Simple GRPO scripts and configurations.☆58Updated 4 months ago
- This repository contains popular code generation frameworks such as MapCoder, CodeSIM.☆52Updated this week
- Train your own SOTA deductive reasoning model☆94Updated 3 months ago
- ☆61Updated 3 weeks ago
- ☆20Updated last week
- Official Code Release for "Training a Generally Curious Agent"☆25Updated last month
- ☆21Updated 3 weeks ago
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)☆115Updated 4 months ago
- ☆34Updated 3 months ago
- A framework for fine-tuning retrieval-augmented generation (RAG) systems.☆112Updated this week
- Lego for GRPO☆28Updated last month
- ☆127Updated 3 months ago
- LLM reads a paper and produce a working prototype☆57Updated 2 months ago
- ☆22Updated 7 months ago
- ☆51Updated 7 months ago
- How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training☆36Updated 2 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆70Updated 6 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 5 months ago
- Really quick-and-dirty example of AI recursive learning☆26Updated 7 months ago
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆45Updated 2 months ago
- Example implementation of Iteration of Tought - Gives a star if you like the project☆41Updated 6 months ago