davendw49 / llm_training_full_stack
π Full Stack Practice of the Large Language Model Training @ RLChina 2024
β37Updated 4 months ago
Alternatives and similar repositories for llm_training_full_stack:
Users that are interested in llm_training_full_stack are comparing it to the libraries listed below
- A comprehensive list of PAPERS, CODEBASES, and, DATASETS on Decision Making using Foundation Models including LLMs and VLMs.β354Updated 9 months ago
- [NeurIPS 2023] Large Language Models Are Semi-Parametric Reinforcement Learning Agentsβ34Updated 9 months ago
- Direct preference optimization with f-divergences.β13Updated 3 months ago
- An index of algorithms for reinforcement learning from human feedback (rlhf))β92Updated 9 months ago
- β71Updated last year
- Monitoring recent cross-research on LLM & RL on arXiv for control. If there are good papers, PRs are welcome.β284Updated 5 months ago
- β53Updated 7 months ago
- A New Approach to Solving SMAC Task: Generating Decision Tree Code from Large Language Modelsβ27Updated 2 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"β127Updated 10 months ago
- Implementation of TWOSOMEβ62Updated last month
- A collection of LLM with RL papersβ253Updated 9 months ago
- β27Updated 3 months ago
- β24Updated 10 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.β288Updated 6 months ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" prβ¦β90Updated last year
- Preference Transformer: Modeling Human Preferences using Transformers for RL (ICLR2023 Accepted)β158Updated last year
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and trainingβ258Updated 8 months ago
- [NeurIPS 2023] We use large language models as commonsense world model and heuristic policy within Monte-Carlo Tree Search, enabling bettβ¦β253Updated 3 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)β126Updated this week
- β30Updated last year
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by gβ¦β29Updated last month
- awesome papers in LLM interpretabilityβ402Updated last month
- LLM-PySC2 is NKAI Decision Team and NUDT Decision Team's Python component of the StarCraft II LLM Decision Environment. It exposes Deepmiβ¦β105Updated last month
- This repo is reproduction resources for linear alignment paper, still workingβ17Updated 8 months ago
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".β52Updated 2 months ago
- β12Updated 11 months ago
- Awesome LLM papers, news and projects about learning to reason with LLM, OpenAI o1, reasonning techniques, chain-of-thought (COT), Large β¦β23Updated 4 months ago
- [AAAI 2023 Oral] Contrastive Identity-Aware Learning for Multi-Agent Value Decompositionβ30Updated 8 months ago
- Implementation of the paper "Q-Adapter: Customizing Pre-trained LLMs to New Preferences with Forgetting Mitigation"β13Updated 4 months ago