hkproj / dpo-notes
Notes on Direct Preference Optimization
☆18Updated 11 months ago
Alternatives and similar repositories for dpo-notes:
Users that are interested in dpo-notes are comparing it to the libraries listed below
- This is the code of MMOA-RAG.☆44Updated last week
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆39Updated last month
- ☆83Updated 2 weeks ago
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)☆44Updated 2 months ago
- Distributed training (multi-node) of a Transformer model☆62Updated 11 months ago
- ☆48Updated last month
- [ICLR 2025] Benchmarking Agentic Workflow Generation☆63Updated last month
- Exploration of automated dataset selection approaches at large scales.☆33Updated 3 weeks ago
- ☆42Updated last month
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆45Updated last month
- ☆119Updated 5 months ago
- Long Context Extension and Generalization in LLMs☆50Updated 6 months ago
- ☆41Updated 11 months ago
- The official repo for "AceCoder: Acing Coder RL via Automated Test-Case Synthesis"☆73Updated last month
- NeurIPS 2024 tutorial on LLM Inference☆39Updated 3 months ago
- Code implementation of synthetic continued pretraining☆95Updated 2 months ago
- Benchmarking Benchmark Leakage in Large Language Models☆52Updated 10 months ago
- Code for NeurIPS LLM Efficiency Challenge☆57Updated 11 months ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆84Updated last month
- ☆26Updated 4 months ago
- A brief and partial summary of RLHF algorithms.☆127Updated 3 weeks ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆74Updated 2 weeks ago
- Codebase accompanying the Summary of a Haystack paper.☆75Updated 6 months ago
- ☆102Updated 3 months ago
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering☆58Updated 3 months ago
- Notes and commented code for RLHF (PPO)☆77Updated last year
- ☆66Updated last week
- ☆35Updated last month
- Training and Benchmarking LLMs for Code Preference.☆33Updated 4 months ago
- a curated list of the role of small models in the LLM era☆95Updated 6 months ago