BY571 / DistRL-LLMLinks
Distributed Reinforcement Learning for LLM Fine-Tuning with multi-GPU utilization
☆17Updated 2 months ago
Alternatives and similar repositories for DistRL-LLM
Users that are interested in DistRL-LLM are comparing it to the libraries listed below
Sorting:
- Train your own SOTA deductive reasoning model☆93Updated 3 months ago
- Simple examples using Argilla tools to build AI☆53Updated 6 months ago
- Source code for our paper: "SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals".☆67Updated 11 months ago
- ☆49Updated 7 months ago
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆24Updated last month
- accompanying material for sleep-time compute paper☆90Updated last month
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆53Updated 4 months ago
- Repository for “PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers”, NAACL24☆138Updated 11 months ago
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆70Updated 7 months ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆60Updated last week
- Data preparation code for CrystalCoder 7B LLM☆44Updated last year
- Training an LLM to use a calculator with multi-turn reinforcement learning, achieving a **62% absolute increase in evaluation accuracy**.☆38Updated last month
- LLM reads a paper and produce a working prototype☆57Updated last month
- ☆121Updated 2 months ago
- Complex Function Calling Benchmark.☆112Updated 4 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆87Updated last month
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆91Updated 3 months ago
- ☆83Updated last month
- ☆59Updated 2 weeks ago
- Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)☆183Updated 2 months ago
- ☆114Updated 3 months ago
- ☆53Updated last year
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆110Updated 8 months ago
- Fine-tunes a student LLM using teacher feedback for improved reasoning and answer quality. Implements GRPO with teacher-provided evaluati…☆43Updated last month
- ☆68Updated 3 months ago
- A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.☆97Updated 4 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆221Updated 7 months ago
- Codebase accompanying the Summary of a Haystack paper.☆78Updated 8 months ago
- [ACL 2025] Agentic Knowledgeable Self-awareness☆69Updated last month
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 4 months ago