ZeroSumEval / ZeroSumEvalLinks
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆34Updated 5 months ago
Alternatives and similar repositories for ZeroSumEval
Users that are interested in ZeroSumEval are comparing it to the libraries listed below
Sorting:
- ☆54Updated 10 months ago
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆20Updated last year
- Small, simple agent task environments for training and evaluation☆18Updated 11 months ago
- ☆81Updated last week
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 8 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆110Updated 9 months ago
- Simple GRPO scripts and configurations.☆59Updated 8 months ago
- Aioli: A unified optimization framework for language model data mixing☆27Updated 8 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆59Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆79Updated last year
- Functional Benchmarks and the Reasoning Gap☆89Updated last year
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆64Updated 9 months ago
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- Verifiers for LLM Reinforcement Learning☆74Updated 5 months ago
- Based on the tree of thoughts paper☆48Updated 2 years ago
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆64Updated 2 years ago
- ☆101Updated 8 months ago
- ☆57Updated last year
- Official repo for Learning to Reason for Long-Form Story Generation☆72Updated 5 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆44Updated last year
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆184Updated 6 months ago
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆24Updated 2 weeks ago
- ☆39Updated last year
- PyTorch implementation for MRL☆19Updated last year
- 🔔🧠 Easily experiment with popular language agents across diverse reasoning/decision-making benchmarks!☆54Updated 2 months ago
- Small and Efficient Mathematical Reasoning LLMs☆72Updated last year
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆35Updated last year
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆55Updated 8 months ago
- ☆24Updated 4 months ago
- Supercharge huggingface transformers with model parallelism.☆77Updated 2 months ago