hkproj / dpo-notesLinks
Notes on Direct Preference Optimization
☆21Updated last year
Alternatives and similar repositories for dpo-notes
Users that are interested in dpo-notes are comparing it to the libraries listed below
Sorting:
- minimal GRPO implementation from scratch☆97Updated 6 months ago
- Distributed training (multi-node) of a Transformer model☆83Updated last year
- a curated list of the role of small models in the LLM era☆104Updated 11 months ago
- ☆48Updated last year
- Code for "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"☆81Updated 11 months ago
- This is the official repository for Inheritune.☆113Updated 7 months ago
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆176Updated 5 months ago
- Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.☆133Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆284Updated 2 weeks ago
- ☆127Updated 11 months ago
- ☆95Updated 11 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆215Updated last month
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆64Updated 4 months ago
- RL significantly the reasoning capability of Qwen2.5-1.5B-Instruct☆30Updated 6 months ago
- SSRL: Self-Search Reinforcement Learning☆131Updated 3 weeks ago
- code for training & evaluating Contextual Document Embedding models☆197Updated 4 months ago
- Complex Function Calling Benchmark.☆135Updated 7 months ago
- Prune transformer layers☆69Updated last year
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆93Updated 4 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆48Updated 4 months ago
- Code for NeurIPS LLM Efficiency Challenge☆59Updated last year
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆55Updated 11 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆118Updated last year
- Repo for "Z1: Efficient Test-time Scaling with Code"☆64Updated 5 months ago
- Improving Text Embedding of Language Models Using Contrastive Fine-tuning☆64Updated last year
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆105Updated 3 months ago
- Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".☆198Updated 2 months ago
- Verifiers for LLM Reinforcement Learning☆72Updated 5 months ago
- ☆85Updated last year