michaelnny / InstructLLaMA
Implements pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), to train and fine-tune the LLaMA2 model to follow human instructions, similar to InstructGPT or ChatGPT, but on a much smaller scale.
☆51Updated last year
Alternatives and similar repositories for InstructLLaMA
Users that are interested in InstructLLaMA are comparing it to the libraries listed below
Sorting:
- ☆138Updated 5 months ago
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆214Updated last year
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)☆205Updated 2 years ago
- Critique-out-Loud Reward Models☆64Updated 6 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆138Updated 3 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆306Updated 9 months ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆99Updated last year
- Code implementation of synthetic continued pretraining☆109Updated 4 months ago
- Direct Preference Optimization from scratch in PyTorch☆91Updated last month
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆54Updated 11 months ago
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆261Updated 8 months ago
- ☆276Updated 4 months ago
- RLHF implementation details of OAI's 2019 codebase☆187Updated last year
- Collection of papers for scalable automated alignment.☆89Updated 6 months ago
- ☆122Updated 10 months ago
- A large-scale, fine-grained, diverse preference dataset (and models).☆338Updated last year
- ☆67Updated last year
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆167Updated 3 weeks ago
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆139Updated last week
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆181Updated last year
- ☆110Updated 3 months ago
- ☆328Updated 3 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆151Updated 8 months ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆108Updated this week
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆115Updated last year
- augmented LLM with self reflection☆121Updated last year
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆77Updated 4 months ago
- ☆102Updated 5 months ago
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆123Updated 11 months ago
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆125Updated 10 months ago