bbartoldson / TBALinks
Official implementation of TBA for async LLM post-training.
☆20Updated 4 months ago
Alternatives and similar repositories for TBA
Users that are interested in TBA are comparing it to the libraries listed below
Sorting:
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆64Updated 5 months ago
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆236Updated last week
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"