hkproj / dpo-notes
Notes on Direct Preference Optimization
☆19Updated last year
Alternatives and similar repositories for dpo-notes:
Users that are interested in dpo-notes are comparing it to the libraries listed below
- Distributed training (multi-node) of a Transformer model☆66Updated last year
- minimal GRPO implementation from scratch☆88Updated last month
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆47Updated last week
- Notes and commented code for RLHF (PPO)☆90Updated last year
- Repository containing awesome resources regarding Hugging Face tooling.☆47Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆77Updated 7 months ago
- ☆47Updated 8 months ago
- Improving Text Embedding of Language Models Using Contrastive Fine-tuning☆64Updated 9 months ago
- ☆85Updated 7 months ago
- This is the official repository for Inheritune.☆111Updated 3 months ago
- Large language models for document ranking.☆51Updated 3 weeks ago
- Prune transformer layers☆69Updated 11 months ago
- ☆42Updated last year
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆54Updated 7 months ago
- This is the code of MMOA-RAG.☆51Updated last month
- a curated list of the role of small models in the LLM era☆99Updated 7 months ago
- Complex Function Calling Benchmark.☆99Updated 3 months ago
- Code for NeurIPS LLM Efficiency Challenge☆57Updated last year
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆84Updated 7 months ago
- An assignment for building an NLP system from scratch.☆25Updated last year
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆47Updated 3 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 8 months ago
- ☆49Updated last year
- Code for paper called Self-Training Elicits Concise Reasoning in Large Language Models☆30Updated 2 weeks ago
- Code for "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆68Updated 6 months ago
- It is a comprehensive resource hub compiling all LLM papers accepted at the International Conference on Learning Representations (ICLR) i…☆61Updated last year
- ☆58Updated 9 months ago
- A brief and partial summary of RLHF algorithms.☆128Updated 2 months ago
- ☆91Updated last week
- What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective☆63Updated 2 months ago