lmarena / PPE
β33Updated 3 months ago
Alternatives and similar repositories for PPE:
Users that are interested in PPE are comparing it to the libraries listed below
- AI Logging for Interpretability and Explainabilityπ¬β103Updated 8 months ago
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]β130Updated 5 months ago
- β95Updated 7 months ago
- Language models scale reliably with over-training and on downstream tasksβ96Updated 10 months ago
- Directional Preference Alignmentβ56Updated 4 months ago
- β89Updated last year
- Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignmentβ66Updated last year
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMsβ52Updated 10 months ago
- Skill-It! A Data-Driven Skills Framework for Understanding and Training Language Modelsβ43Updated last year
- Long Context Extension and Generalization in LLMsβ48Updated 4 months ago
- Self-Alignment with Principle-Following Reward Modelsβ154Updated 11 months ago
- This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DPβ¦β20Updated 2 months ago
- β27Updated 3 months ago
- β47Updated 10 months ago
- Critique-out-Loud Reward Modelsβ51Updated 4 months ago
- β80Updated 11 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervisionβ115Updated 5 months ago
- β34Updated last year
- A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643β74Updated last year
- β26Updated 7 months ago
- β93Updated last year
- β46Updated last year
- ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignmentβ50Updated 8 months ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"β100Updated 7 months ago
- β132Updated 2 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"β72Updated 8 months ago
- β95Updated 4 months ago
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"β29Updated 8 months ago
- β44Updated 6 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Modelsβ54Updated 2 months ago