harvey-fin / absence-benchLinks
Code implementation for paper AbsenceBench: Language Models Can't Tell What's Missing
☆17Updated 3 months ago
Alternatives and similar repositories for absence-bench
Users that are interested in absence-bench are comparing it to the libraries listed below
Sorting:
- ☆94Updated last week
- ☆31Updated last year
- ☆91Updated last month
- A reading list of relevant papers and projects on foundation model annotation☆28Updated 11 months ago
- Pivotal Token Search☆144Updated last month
- j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.☆101Updated 6 months ago
- ☆95Updated last week
- Efficiently computing & storing token n-grams from large corpora☆26Updated last year
- A collection of lightweight interpretability scripts to understand how LLMs think☆89Updated last week
- Simple GRPO scripts and configurations.☆59Updated 11 months ago
- Small, simple agent task environments for training and evaluation☆19Updated last year
- ☆48Updated 6 months ago
- ☆25Updated 8 months ago
- lossily compress representation vectors using product quantization☆59Updated 3 months ago
- ☆59Updated 2 months ago
- Project code for training LLMs to write better unit tests + code☆21Updated 8 months ago
- ☆59Updated last year
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆260Updated last week
- Storing long contexts in tiny caches with self-study☆231Updated last month
- ☆19Updated last year
- ☆53Updated 11 months ago
- Python library to use Pleias-RAG models☆68Updated 9 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 9 months ago
- Sphynx Hallucination Induction☆52Updated last year
- A tool for benchmarking LLMs on Modal☆45Updated 5 months ago
- AI Evaluation Platform☆47Updated 8 months ago
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).☆82Updated last year
- chrome extension for renaming tabs showing paper-pdfs from common providers☆98Updated last year
- ☆48Updated 11 months ago
- Evaluating LLMs with fewer examples☆169Updated last year