inclusionAI / ASearcherLinks
☆23Updated this week
Alternatives and similar repositories for ASearcher
Users that are interested in ASearcher are comparing it to the libraries listed below
Sorting:
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆241Updated 5 months ago
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆190Updated last year
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆218Updated this week
- A collection of LLM with RL papers☆276Updated last year
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆307Updated 3 months ago
- ☆154Updated 6 months ago
- An Awesome List of Reinforcement Learning-based Large Language Agent Works. Collect directly from official code base.☆243Updated this week
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆245Updated 3 months ago
- [NeurIPS 2023] We use large language models as commonsense world model and heuristic policy within Monte-Carlo Tree Search, enabling bett…☆280Updated 8 months ago
- ☆147Updated 8 months ago
- A Massively Parallel Large Scale Self-Play Framework☆351Updated 2 years ago
- [NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agents☆34Updated last year
- ☆19Updated 8 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆185Updated 3 months ago
- Tutorial for Ray☆28Updated last year
- A Telegram bot to recommend arXiv papers☆281Updated 4 months ago
- Deepseek R1 zero tiny version own reproduce on two A100s.☆70Updated 6 months ago
- Train your grpo with zero dataset and low resources, 8bit/4bit/lora/qlora supported, multi-gpu supported ...☆75Updated 3 months ago
- ☆176Updated last month
- ☆263Updated 2 months ago
- ☆164Updated 2 weeks ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆271Updated 3 weeks ago
- RLHF implementation details of OAI's 2019 codebase☆187Updated last year
- A comprehensive list of PAPERS, CODEBASES, and, DATASETS on Decision Making using Foundation Models including LLMs and VLMs.☆374Updated last year
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆188Updated 4 months ago
- Simplest AlphaZero Implementation☆22Updated 9 months ago
- On Memorization of Large Language Models in Logical Reasoning☆70Updated 4 months ago
- Build, evaluate and train General Multi-Agent Assistance with ease☆506Updated this week
- ☆302Updated last week
- Awesome In-Context RL: A curated list of In-Context Reinforcement Learning - - —☆212Updated 3 weeks ago