michaelnny / InstructLLaMA
Implements pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), to train and fine-tune the LLaMA2 model to follow human instructions, similar to InstructGPT or ChatGPT, but on a much smaller scale.
☆48Updated last year
Alternatives and similar repositories for InstructLLaMA:
Users that are interested in InstructLLaMA are comparing it to the libraries listed below
- Direct Preference Optimization from scratch in PyTorch☆89Updated last year
- ☆136Updated 4 months ago
- RLHF implementation details of OAI's 2019 codebase☆184Updated last year
- ☆117Updated 9 months ago
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)☆204Updated 2 years ago
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆208Updated last year
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆299Updated 7 months ago
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆251Updated 6 months ago
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆119Updated 9 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆52Updated 9 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆130Updated last month
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆94Updated last year
- Critique-out-Loud Reward Models☆56Updated 5 months ago
- ☆325Updated last month
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆108Updated 3 weeks ago
- Scripts of LLM pre-training and fine-tuning (w/wo LoRA, DeepSpeed)☆77Updated last year
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆115Updated last year
- ☆271Updated 2 months ago
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training☆260Updated 10 months ago
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆137Updated 9 months ago
- ☆65Updated 11 months ago
- Scripts for fine-tuning Llama2 via SFT and DPO.☆195Updated last year
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆153Updated last year
- Code implementation of synthetic continued pretraining☆95Updated 2 months ago
- A large-scale, fine-grained, diverse preference dataset (and models).☆335Updated last year
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆179Updated last year
- RewardBench: the first evaluation tool for reward models.☆532Updated last month
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆125Updated 3 months ago
- ☆144Updated 3 months ago
- ☆264Updated 8 months ago