jasonvanf / llama-trlLinks
LLaMA-TRL: Fine-tuning LLaMA with PPO and LoRA
☆229Updated last month
Alternatives and similar repositories for llama-trl
Users that are interested in llama-trl are comparing it to the libraries listed below
Sorting:
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆266Updated last year
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆139Updated 5 months ago
- A large-scale, fine-grained, diverse preference dataset (and models).☆353Updated last year
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆396Updated 3 months ago
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆570Updated 10 months ago
- Generative Judge for Evaluating Alignment☆246Updated last year
- ☆280Updated 9 months ago
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆275Updated 2 years ago
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them☆514Updated last year
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆115Updated 2 years ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆176Updated 3 months ago
- Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets☆341Updated last year
- [ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning☆497Updated 11 months ago
- Datasets for Instruction Tuning of Large Language Models☆257Updated last year
- ☆342Updated 4 months ago
- ☆307Updated last year
- Code and data for "MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning" [ICLR 2024]☆376Updated last year
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆326Updated last year
- [ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models☆114Updated 3 months ago
- Prod Env☆430Updated 2 years ago
- Data and Code for Program of Thoughts [TMLR 2023]☆287Updated last year
- RewardBench: the first evaluation tool for reward models.☆640Updated 3 months ago
- Awesome papers for role-playing with language models☆205Updated 11 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆256Updated 9 months ago
- A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.☆379Updated 2 years ago
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"☆517Updated 8 months ago
- Collection of papers for scalable automated alignment.☆93Updated 11 months ago
- Direct Preference Optimization from scratch in PyTorch☆113Updated 6 months ago
- Papers and Datasets on Instruction Tuning and Following. ✨✨✨☆500Updated last year
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆250Updated 11 months ago