lmarena / PPE
☆24Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for PPE
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆127Updated 2 months ago
- Language models scale reliably with over-training and on downstream tasks☆94Updated 7 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆48Updated 7 months ago
- ☆115Updated 4 months ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆73Updated 3 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆71Updated last month
- ☆90Updated 4 months ago
- ☆75Updated last month
- ☆46Updated 2 weeks ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆68Updated 5 months ago
- ☆89Updated 11 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆41Updated 9 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆97Updated 2 months ago
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆118Updated 4 months ago
- Long Context Extension and Generalization in LLMs☆39Updated 2 months ago
- AI Logging for Interpretability and Explainability🔬☆89Updated 5 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆61Updated 7 months ago
- Directional Preference Alignment☆51Updated 2 months ago
- Explore what LLMs are really leanring over SFT☆26Updated 7 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆198Updated 2 weeks ago
- Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging☆98Updated last year
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆124Updated 3 weeks ago
- [NeurIPS'24 Spotlight] Observational Scaling Laws☆44Updated last month
- A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.☆76Updated 9 months ago
- [NeurIPS 2024] Knowledge Circuits in Pretrained Transformers☆75Updated last month
- Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment☆65Updated last year
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆26Updated 5 months ago
- Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; arXiv preprint arXiv:2403.…☆37Updated 4 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆63Updated last month
- ☆62Updated 3 weeks ago