hkproj / dpo-notes
Notes on Direct Preference Optimization
☆16Updated 10 months ago
Alternatives and similar repositories for dpo-notes:
Users that are interested in dpo-notes are comparing it to the libraries listed below
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆43Updated last month
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆52Updated 4 months ago
- ☆117Updated 4 months ago
- Distributed training (multi-node) of a Transformer model☆54Updated 10 months ago
- Codebase for Instruction Following without Instruction Tuning☆33Updated 4 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆54Updated 5 months ago
- ☆64Updated 2 weeks ago
- Code for "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆44Updated 4 months ago
- ☆47Updated 5 months ago
- ☆53Updated 4 months ago
- Codebase accompanying the Summary of a Haystack paper.☆74Updated 5 months ago
- Improving Text Embedding of Language Models Using Contrastive Fine-tuning☆59Updated 6 months ago
- PyTorch building blocks for the OLMo ecosystem☆54Updated this week
- ☆48Updated last year
- ☆37Updated 10 months ago
- ☆17Updated 4 months ago
- List of papers on Self-Correction of LLMs.☆71Updated last month
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆43Updated this week
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆69Updated 2 months ago
- Source code for our paper: "Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction A…☆43Updated last year
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆31Updated last year
- NeurIPS 2024 tutorial on LLM Inference☆39Updated 2 months ago
- Notes and commented code for RLHF (PPO)☆69Updated 11 months ago
- ☆40Updated 9 months ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆32Updated 4 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆65Updated 6 months ago