A framework for pitting LLMs against each other in an evolving library of games ⚔
☆35Apr 17, 2025Updated last year
Alternatives and similar repositories for ZeroSumEval
Users that are interested in ZeroSumEval are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆28Mar 6, 2024Updated 2 years ago
- A framework for building large-scale, deterministic, interactive workflows with a fault-tolerant, conversational UX☆48Updated this week
- ☆13Updated this week
- 🐇A rabbit-fast Rust reimplementation inspired by Claude Code, with native TUI, deeper tooling, and a cleaner path for terminal-first AI …☆44Apr 9, 2026Updated 2 months ago
- Environments by the Prime Intellect Research Team☆67Updated this week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A framework making it effortless to convert any llm model into a reasoning agent like o1 or DeepSeek's r1☆24Oct 13, 2025Updated 8 months ago
- DSPy program/pipeline inspector widget for Jupyter/VSCode Notebooks.☆45Feb 15, 2024Updated 2 years ago
- Programmable chat templates for LLM training and inference.☆121Updated this week
- slowly building a set of infinite riddle generators for data-hungry methods☆14Nov 15, 2022Updated 3 years ago
- ☆13Feb 20, 2020Updated 6 years ago
- ☆18Sep 21, 2023Updated 2 years ago
- ReaSCAN is a synthetic navigation task that requires models to reason about surroundings over syntactically difficult languages. (NeurIPS…☆19Nov 28, 2021Updated 4 years ago
- moodist☆28Apr 23, 2026Updated 2 months ago
- Conversations with Search Engines☆14Jun 12, 2023Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Материалы курса "Компьютерная лингвистика и информационные технологии" для 4-го курса бакалавриата направления "Фундаментальная и приклад…☆10Mar 25, 2021Updated 5 years ago
- Abstraction and Reasoning Corpus☆14Nov 22, 2022Updated 3 years ago
- Problem-Oriented Segmentation and Retrieval EMNLP 2024 Findings☆34Nov 12, 2024Updated last year
- DSPY on action with OpenSource LLMs.☆107Apr 9, 2024Updated 2 years ago
- A Parallel Russian-Simple Russian Dataset☆18Mar 30, 2023Updated 3 years ago
- nyc is so back☆21Jun 27, 2025Updated last year
- The dataset and code for PeerSum at EMNLP'23.☆16Oct 20, 2025Updated 8 months ago
- A lexicon compiler for non-suffixational morphologies☆14Jan 29, 2026Updated 5 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆454Feb 13, 2024Updated 2 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆65May 8, 2025Updated last year
- A collection of implementations of fair ML algorithms☆12Jan 7, 2018Updated 8 years ago
- This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection.☆23Mar 11, 2024Updated 2 years ago
- Efficient vector database for hundred millions of embeddings.☆216May 17, 2024Updated 2 years ago
- ☆20Mar 22, 2024Updated 2 years ago
- A Bert2Bert model which able to generate headlines!☆12Nov 16, 2020Updated 5 years ago
- ☆18Apr 26, 2021Updated 5 years ago
- An evaluation toolbox for machine learning explanations☆16Jan 7, 2024Updated 2 years ago
- Iterative specification refinement tool: feeds your docs through GPT Pro Extended Reasoning via Oracle for multiple revision rounds until…☆60Mar 22, 2026Updated 3 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Experimental studies of my paper "Sampling Techniques in Bayesian Target Encoding"☆12Dec 8, 2022Updated 3 years ago
- Simple GRPO scripts and configurations.☆58Feb 6, 2025Updated last year
- An implementation of the Anthropic's paper and essay on "A statistical approach to model evaluations"☆16Oct 6, 2025Updated 8 months ago
- A library to create and manage configuration files, especially for machine learning projects.☆79Mar 14, 2022Updated 4 years ago
- This is for EMNLP 2024 Paper: AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction☆16Nov 4, 2024Updated last year
- ☆40Feb 5, 2026Updated 4 months ago
- Программирование и теория алгоритмов 2019-2020, ФиКЛ ВШЭ☆12Jun 9, 2020Updated 6 years ago