michaelnny / InstructLLaMALinks
Implements pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), to train and fine-tune the LLaMA2 model to follow human instructions, similar to InstructGPT or ChatGPT, but on a much smaller scale.
☆56Updated last year
Alternatives and similar repositories for InstructLLaMA
Users that are interested in InstructLLaMA are comparing it to the libraries listed below
Sorting:
- LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA☆237Updated 5 months ago
- A large-scale, fine-grained, diverse preference dataset (and models).☆361Updated 2 years ago
- ☆130Updated last year
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆329Updated last week
- ☆160Updated last year
- ☆341Updated 8 months ago
- Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)☆220Updated 2 years ago
- (ICML 2024) Alphazero-like Tree-Search can guide large language model decoding and training☆285Updated last year
- RLHF implementation details of OAI's 2019 codebase☆197Updated 2 years ago
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆115Updated last year
- Direct Preference Optimization from scratch in PyTorch☆126Updated 10 months ago
- Scripts for fine-tuning Llama2 via SFT and DPO.☆206Updated 2 years ago
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆269Updated last year
- Code implementation of synthetic continued pretraining☆148Updated last year
- RewardBench: the first evaluation tool for reward models.☆685Updated last week
- ☆282Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆151Updated 11 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆56Updated last year
- ☆322Updated last year
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆199Updated 2 years ago
- [ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement☆193Updated last year
- A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.☆386Updated 2 years ago
- [ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning☆512Updated last year
- Reasoning with Language Model is Planning with World Model☆185Updated 2 years ago
- ☆554Updated last year
- Critique-out-Loud Reward Models☆73Updated last year
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆416Updated 7 months ago
- A continually updated list of literature on Reinforcement Learning from AI Feedback (RLAIF)☆195Updated 6 months ago
- Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets☆350Updated 2 years ago
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them☆546Updated last year