davendw49 / llm_training_full_stack
π Full Stack Practice of the Large Language Model Training @ RLChina 2024
β39Updated 5 months ago
Alternatives and similar repositories for llm_training_full_stack:
Users that are interested in llm_training_full_stack are comparing it to the libraries listed below
- A New Approach to Solving SMAC Task: Generating Decision Tree Code from Large Language Modelsβ35Updated 4 months ago
- [NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agentsβ35Updated 10 months ago
- A comprehensive list of PAPERS, CODEBASES, and, DATASETS on Decision Making using Foundation Models including LLMs and VLMs.β362Updated 11 months ago
- SOTA RL fine-tuning solution for advanced math reasoning of LLMβ92Updated this week
- Direct preference optimization with f-divergences.β13Updated 4 months ago
- This is the official implementation of paper "Leveraging Dual Process Theory in Language Agent Framework for Simultaneous Human-AI Collabβ¦β30Updated 2 weeks ago
- A collection of LLM with RL papersβ266Updated 11 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"β156Updated last year
- Natural Language Reinforcement Learningβ84Updated 3 months ago
- Monitoring recent cross-research on LLM & RL on arXiv for control. If there are good papers, PRs are welcome.β348Updated 6 months ago
- β74Updated last year
- LLM-PySC2 is NKAI Decision Team and NUDT Decision Team's Python component of the StarCraft II LLM Decision Environment. It exposes Deepmiβ¦β115Updated 2 months ago
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)β179Updated last year
- Implementation of TWOSOMEβ69Updated 2 months ago
- An index of algorithms for reinforcement learning from human feedback (rlhf))β93Updated 11 months ago
- [AAAI 2023 Oral] Contrastive Identity-Aware Learning for Multi-Agent Value Decompositionβ33Updated 9 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.β301Updated 7 months ago
- β13Updated last year
- DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agentsβ19Updated last month
- AAAI24(Oral) ProAgent: Building Proactive Cooperative Agents with Large Language Modelsβ77Updated 3 weeks ago
- β58Updated 8 months ago
- β29Updated last year
- β62Updated last year
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and trainingβ260Updated 10 months ago
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by gβ¦β32Updated 3 months ago
- Tracking literature and additional online resources on transformers for sequential decision making including RL and beyond.β44Updated 2 years ago
- AdaPlanner: Language Models for Decision Making via Adaptive Planning from Feedbackβ106Updated last year
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" prβ¦β94Updated last year
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuningβ125Updated 3 months ago
- β13Updated 5 months ago