A framework for pitting LLMs against each other in an evolving library of games ⚔
☆35Apr 17, 2025Updated 11 months ago
Alternatives and similar repositories for ZeroSumEval
Users that are interested in ZeroSumEval are comparing it to the libraries listed below
Sorting:
- Official Documentation for DSPy Library☆21Mar 13, 2026Updated last week
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Apr 20, 2025Updated 11 months ago
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆27Mar 6, 2024Updated 2 years ago
- ☆11Dec 11, 2024Updated last year
- DSPy program/pipeline inspector widget for Jupyter/VSCode Notebooks.☆45Feb 15, 2024Updated 2 years ago
- Structured outputs from DSPy and Jinja2☆27Jun 27, 2025Updated 8 months ago
- slowly building a set of infinite riddle generators for data-hungry methods☆14Nov 15, 2022Updated 3 years ago
- A CLI tool you can pipe code and then ask for changes, add documentation, etc, using the OpenAI API.☆13Jan 5, 2024Updated 2 years ago
- ☆18Sep 21, 2023Updated 2 years ago
- ReaSCAN is a synthetic navigation task that requires models to reason about surroundings over syntactically difficult languages. (NeurIPS…☆19Nov 28, 2021Updated 4 years ago
- Repository for paper Decrypting Cryptic Crosswords☆10Jan 15, 2022Updated 4 years ago
- moodist☆25Mar 13, 2026Updated last week
- ☆15Dec 10, 2021Updated 4 years ago
- LLM code editor for backend services☆16Oct 19, 2024Updated last year
- Abstraction and Reasoning Corpus☆14Nov 22, 2022Updated 3 years ago
- ☆15Dec 15, 2025Updated 3 months ago
- Problem-Oriented Segmentation and Retrieval EMNLP 2024 Findings☆34Nov 12, 2024Updated last year
- DSPY on action with OpenSource LLMs.☆105Apr 9, 2024Updated last year
- ☆22Aug 31, 2021Updated 4 years ago
- NLQuAD: A Non-Factoid Long Question Answering Data Set. To be published at EACL2021☆13May 18, 2021Updated 4 years ago
- nyc is so back☆21Jun 27, 2025Updated 8 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆450Feb 13, 2024Updated 2 years ago
- A lexicon compiler for non-suffixational morphologies☆13Jan 29, 2026Updated last month
- run deepseek v3 on a single node. Drops unused experts from memory.☆16Jan 26, 2025Updated last year
- This repository contains the ToolSelect dataset which was used to fine-tune Llama-2 70B for tool selection.☆22Mar 11, 2024Updated 2 years ago
- Efficient vector database for hundred millions of embeddings.☆212May 17, 2024Updated last year
- ☆20Mar 22, 2024Updated last year
- ☆18Apr 26, 2021Updated 4 years ago
- A Toolbox for the Evaluation of machine learning Explanations☆16Jan 7, 2024Updated 2 years ago
- ☆16Jan 3, 2023Updated 3 years ago
- ☆25Nov 19, 2025Updated 4 months ago
- Facilitates Visual Representation of Sign Language Data and Glosses☆19May 16, 2025Updated 10 months ago
- Code, data, and pretrained models for the paper "Generating Wikipedia Article Sections from Diverse Data Sources"☆20Feb 5, 2021Updated 5 years ago
- Simple GRPO scripts and configurations.☆59Feb 6, 2025Updated last year
- A Ruby on Rails style framework for the DSPy (Demonstrate, Search, Predict) project for Language Models like GPT, BERT, and LLama.☆132Oct 16, 2024Updated last year
- An implementation of the Anthropic's paper and essay on "A statistical approach to model evaluations"☆17Oct 6, 2025Updated 5 months ago
- Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)☆48Dec 27, 2021Updated 4 years ago
- This is for EMNLP 2024 Paper: AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction☆15Nov 4, 2024Updated last year
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆189May 3, 2025Updated 10 months ago