davendw49 / llm_training_full_stack
π Full Stack Practice of the Large Language Model Training @ RLChina 2024
β39Updated 6 months ago
Alternatives and similar repositories for llm_training_full_stack:
Users that are interested in llm_training_full_stack are comparing it to the libraries listed below
- A New Approach to Solving SMAC Task: Generating Decision Tree Code from Large Language Modelsβ39Updated last month
- Code release for "Generating Code World Models with Large Language Models Guided by Monte Carlo Tree Search" published at NeurIPS '24.β10Updated 2 months ago
- [NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agentsβ35Updated last year
- A comprehensive list of PAPERS, CODEBASES, and, DATASETS on Decision Making using Foundation Models including LLMs and VLMs.β367Updated last year
- Direct preference optimization with f-divergences.β13Updated 6 months ago
- An index of algorithms for reinforcement learning from human feedback (rlhf))β92Updated last year
- LLM-PySC2 is NKAI Decision Team and NUDT Decision Team's Python component of the StarCraft II LLM Decision Environment. It exposes Deepmiβ¦β121Updated 2 weeks ago
- A collection of LLM with RL papersβ269Updated last year
- Monitoring recent cross-research on LLM & RL on arXiv for control. If there are good papers, PRs are welcome.β383Updated 7 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"β167Updated 3 weeks ago
- β76Updated last year
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"β113Updated this week
- Awesome In-Context RL: A curated list of In-Context Reinforcement Learning - - ββ161Updated last month
- Natural Language Reinforcement Learningβ87Updated 4 months ago
- Implementation of TWOSOMEβ71Updated 3 months ago
- A collection on the recent reproduction papers and projects on DeepSeek-R1β30Updated 2 months ago
- Tracking literature and additional online resources on transformers for sequential decision making including RL and beyond.β45Updated 2 years ago
- [TNNLS-2024, arXiv-2023.2.10] Official repository of "A Survey on Causal Reinforcement Learning"β25Updated 2 weeks ago
- A library for constrained RLHF.β13Updated last year
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".β52Updated 5 months ago
- β59Updated 9 months ago
- β11Updated last year
- Benchmarking LLMs' Gaming Ability in Multi-Agent Environmentsβ73Updated last week
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" prβ¦β99Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)β138Updated 2 months ago
- β109Updated 3 months ago
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by gβ¦β35Updated last month
- Preference Transformer: Modeling Human Preferences using Transformers for RL (ICLR2023 Accepted)β163Updated last year
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.β306Updated 9 months ago
- This is the official implementation of paper "Leveraging Dual Process Theory in Language Agent Framework for Simultaneous Human-AI Collabβ¦β34Updated last month