ZeroSumEval / ZeroSumEval
A framework for pitting LLMs against each other in an evolving library of games ⚔
☆33Updated 2 weeks ago
Alternatives and similar repositories for ZeroSumEval:
Users that are interested in ZeroSumEval are comparing it to the libraries listed below
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- Aioli: A unified optimization framework for language model data mixing☆23Updated 2 months ago
- ☆48Updated 5 months ago
- Lightweight tools for quick and easy LLM demo's☆26Updated 6 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆72Updated 7 months ago
- Learning to route instances for Human vs AI Feedback☆23Updated 2 months ago
- ☆80Updated 3 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆53Updated 4 months ago
- ☆33Updated 9 months ago
- An attribution library for LLMs☆38Updated 6 months ago
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated last year
- ☆27Updated 3 weeks ago
- Small, simple agent task environments for training and evaluation☆18Updated 5 months ago
- PyTorch implementation for MRL☆18Updated last year
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆32Updated 5 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated last year
- Repository for NPHardEval, a quantified-dynamic benchmark of LLMs☆52Updated last year
- ☆21Updated 6 months ago
- Simple GRPO scripts and configurations.☆58Updated 2 months ago
- Code for the paper: CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models☆17Updated 2 weeks ago
- Functional Benchmarks and the Reasoning Gap☆84Updated 6 months ago
- Minimum Description Length probing for neural network representations☆19Updated 2 months ago
- ☆16Updated 6 months ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆34Updated this week
- ☆23Updated last week
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆42Updated last year
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆79Updated 4 months ago
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆63Updated last year
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 2 months ago
- ☆41Updated last week