michaelnny / InstructLLaMA

Implements pre-training, supervised fine-tuning (SFT), and reinforcement learning from human feedback (RLHF), to train and fine-tune the LLaMA2 model to follow human instructions, similar to InstructGPT or ChatGPT, but on a much smaller scale.

☆51

Alternatives and similar repositories for InstructLLaMA

Users that are interested in InstructLLaMA are comparing it to the libraries listed below

Sorting:

vwxyzjn / summarize_from_feedback_details
☆138Updated 5 months ago
jasonvanf / llama-trl
LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA
☆214Updated last year
ezelikman / STaR
Code for STaR: Bootstrapping Reasoning With Reasoning (NeurIPS 2022)
☆205Updated 2 years ago
zankner / CLoud
Critique-out-Loud Reward Models
☆64Updated 6 months ago
Vance0124 / Token-level-Direct-Preference-Optimization
Reference implementation for Token-level Direct Preference Optimization(TDPO)
☆138Updated 3 months ago
YuxiXie / MCTS-DPO
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
☆306Updated 9 months ago
WooooDyy / LLM-Reverse-Curriculum-RL
Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…
☆99Updated last year
ZitongYang / Synthetic_Continued_Pretraining
Code implementation of synthetic continued pretraining
☆109Updated 4 months ago
0xallam / Direct-Preference-Optimization
Direct Preference Optimization from scratch in PyTorch
☆91Updated last month
Linear95 / APO
Code for ACL2024 paper - Adversarial Preference Optimization (APO).
☆54Updated 11 months ago
OFA-Sys / gsm8k-ScRel
Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
☆261Updated 8 months ago
allenai / FineGrainedRLHF
☆276Updated 4 months ago
vwxyzjn / lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
☆187Updated last year
icip-cas / awesome-auto-alignment
Collection of papers for scalable automated alignment.
☆89Updated 6 months ago
BrendanGraham14 / mcts-llm
☆122Updated 10 months ago
OpenBMB / UltraFeedback
A large-scale, fine-grained, diverse preference dataset (and models).
☆338Updated last year
FreedomIntelligence / OVM
☆67Updated last year
YifeiZhou02 / ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
☆167Updated 3 weeks ago
Spico197 / Humpback
🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.
☆139Updated last week
liziniu / ReMax
Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)
☆181Updated last year
jwhj / OREO
☆110Updated 3 months ago
MARIO-Math-Reasoning / Super_MARIO
☆328Updated 3 months ago
tianyi-lab / Superfiltering
[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
☆151Updated 8 months ago
kyegomez / Lets-Verify-Step-by-Step
"Improving Mathematical Reasoning with Process Supervision" by OPENAI
☆108Updated this week
l294265421 / alpaca-rlhf
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
☆115Updated last year
rxlqn / awesome-llm-self-reflection
augmented LLM with self reflection
☆121Updated last year
SparkJiao / dpo-trajectory-reasoning
[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆77Updated 4 months ago
Open-Source-O1 / o1_Reasoning_Patterns_Study
☆102Updated 5 months ago
xingyaoww / mint-bench
Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…
☆123Updated 11 months ago
tongyx361 / Awesome-LLM4Math
Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…
☆125Updated 10 months ago