jasonvanf / llama-trlLinks

LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA

☆220

Alternatives and similar repositories for llama-trl

Users that are interested in llama-trl are comparing it to the libraries listed below

Sorting:

Spico197 / Humpback
🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.
☆140Updated 3 months ago
OFA-Sys / gsm8k-ScRel
Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
☆268Updated 10 months ago
hkust-nlp / deita
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
☆561Updated 7 months ago
OpenBMB / UltraFeedback
A large-scale, fine-grained, diverse preference dataset (and models).
☆345Updated last year
allenai / FineGrainedRLHF
☆278Updated 7 months ago
l294265421 / alpaca-rlhf
Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat
☆114Updated 2 years ago
GAIR-NLP / auto-j
Generative Judge for Evaluating Alignment
☆244Updated last year
OFA-Sys / InsTag
InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning
☆265Updated last year
sangmichaelxie / doremi
Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
☆340Updated last year
raunak-agarwal / instruction-datasets
All available datasets for Instruction Tuning of Large Language Models
☆255Updated last year
neelsjain / NEFTune
Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning
☆397Updated last year
glgh / awesome-llm-human-preference-datasets
A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.
☆372Updated last year
tianyi-lab / Cherry_LLM
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…
☆381Updated last month
Cohere-Labs-Community / parameter-efficient-moe
☆269Updated last year
tianyi-lab / Superfiltering
[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
☆166Updated last month
MARIO-Math-Reasoning / Super_MARIO
☆337Updated 2 months ago
OpenLMLab / LEval
[ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark
☆388Updated last year
princeton-nlp / LESS
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
☆474Updated 9 months ago
agi-templar / Stable-Alignment
Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Langu…
☆352Updated 2 years ago
git-cloner / llama-lora-fine-tuning
llama fine-tuning with lora
☆139Updated last year
TIGER-AI-Lab / Program-of-Thoughts
Data and Code for Program of Thoughts [TMLR 2023]
☆280Updated last year
itsnamgyu / reasoning-teacher
Large Language Models Are Reasoning Teachers (ACL 2023)
☆341Updated 4 months ago
voidism / DoLa
Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"
☆504Updated 6 months ago
nelson-liu / lost-in-the-middle
Code and data for "Lost in the Middle: How Language Models Use Long Contexts"
☆354Updated last year
THUDM / LongAlign
[EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs
☆253Updated 7 months ago
night-chen / ToolQA
ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …
☆272Updated last year
RenzeLou / awesome-instruction-learning
Papers and Datasets on Instruction Tuning and Following. ✨✨✨
☆498Updated last year
xiaoya-li / Instruction-Tuning-Survey
Project for the paper entitled `Instruction Tuning for Large Language Models: A Survey`
☆180Updated 8 months ago
AI21Labs / in-context-ralm
☆284Updated last year
YuxiXie / MCTS-DPO
This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.
☆319Updated last year