allenai / easy-to-hard-generalization
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆46Updated last year
Alternatives and similar repositories for easy-to-hard-generalization:
Users that are interested in easy-to-hard-generalization are comparing it to the libraries listed below
- Codebase for Instruction Following without Instruction Tuning☆33Updated 5 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆44Updated last month
- ☆60Updated 10 months ago
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆52Updated 11 months ago
- ☆30Updated 2 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 6 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆42Updated last year
- ☆64Updated 11 months ago
- B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners☆75Updated 2 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆130Updated 6 months ago
- ☆15Updated 8 months ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- The repository contains code for Adaptive Data Optimization☆20Updated 3 months ago
- Code and data used in the paper: "Training on Incorrect Synthetic Data via RL Scales LLM Math Reasoning Eight-Fold"☆29Updated 9 months ago
- Replicating O1 inference-time scaling laws☆83Updated 3 months ago
- Syntax Error-Free and Generalizable Tool Use for LLMs via Finite-State Decoding☆27Updated last year
- ☆39Updated 7 months ago
- CodeUltraFeedback: aligning large language models to coding preferences☆70Updated 8 months ago
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆96Updated 2 weeks ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆26Updated 6 months ago
- Benchmarking Benchmark Leakage in Large Language Models☆52Updated 10 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆45Updated 2 months ago