FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists
☆31Aug 14, 2025Updated 6 months ago
Alternatives and similar repositories for FBI
Users that are interested in FBI are comparing it to the libraries listed below
Sorting:
- A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.☆15Feb 28, 2025Updated last year
- Official codebase accompanying our ACL 2022 paper "RELiC: Retrieving Evidence for Literary Claims" (https://relic.cs.umass.edu).☆20May 14, 2022Updated 3 years ago
- ☆52Nov 27, 2024Updated last year
- Notes on Direct Preference Optimization☆24Apr 14, 2024Updated last year
- The official repository for our EMNLP 2024 paper, Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretab…☆20Feb 23, 2025Updated last year
- ☆25May 16, 2024Updated last year
- MetaQA: Combining Expert Agents for Multi-Skill Question Answering☆23Mar 13, 2022Updated 3 years ago
- Collection of resources for RL and Reasoning☆27Feb 3, 2025Updated last year
- Code repository for "Introducing Airavata: Hindi Instruction-tuned LLM"☆64Oct 26, 2024Updated last year
- ☆28Jul 20, 2017Updated 8 years ago
- Monitoring the health of ARR☆29Jan 24, 2026Updated last month
- This repository contains implementation of CROSSGRAD (https://openreview.net/forum?id=r1Dx7fbCW) and DAN (https://arxiv.org/abs/1505.0781…☆24Dec 28, 2018Updated 7 years ago
- A SapientML plugin of SapientMLGenerator☆11Dec 23, 2025Updated 2 months ago
- [CCS 2024] Optimization-based Prompt Injection Attack to LLM-as-a-Judge☆39Sep 17, 2025Updated 5 months ago
- 日本語マルチタスク言語理解ベンチマーク Japanese Massive Multitask Language Understanding Benchmark☆38Oct 7, 2025Updated 4 months ago
- An Educational Framework Based on PyTorch for Deep Learning Education and Exploration☆10Dec 24, 2023Updated 2 years ago
- A Model Agnostic function to directly remove specified layers from the LLM☆10May 23, 2024Updated last year
- ☆12Dec 12, 2024Updated last year
- ☆12Apr 21, 2025Updated 10 months ago
- javascript animation capture examples 🎬☆13Mar 14, 2023Updated 2 years ago
- AAAI 2022 Paper: Bet even Beth Harmon couldn't learn chess like that :)☆38Mar 3, 2021Updated 5 years ago
- A platform aimed at creating websites that perform self-optimization☆12May 4, 2024Updated last year
- Ansible for building kaggle environment☆13Jul 30, 2019Updated 6 years ago
- 2D physics engine☆11Jan 12, 2023Updated 3 years ago
- [ACL 2023] Counterspeeches up my sleeve! Intent Distribution Learning and Persistent Fusion for Intent-Conditioned Counterspeech Generati…☆10Sep 23, 2023Updated 2 years ago
- Official implementation of the paper "On the Importance of Environments in Human-Robot Coordination", published in RSS 2021.☆16May 1, 2024Updated last year
- A RAG that can scale 🧑🏻💻☆11May 28, 2024Updated last year
- Query-focused summarization data☆44Feb 17, 2023Updated 3 years ago
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆44Aug 10, 2024Updated last year
- Self-organized P2P ride sharing community☆12Dec 3, 2024Updated last year
- Unveiling and Mitigating Bias in Mental Health Analysis with Large Language Models☆12Jun 21, 2024Updated last year
- ☆11Jun 5, 2024Updated last year
- NLPBench: Evaluating NLP-Related Problem-solving Ability in Large Language Models☆10Oct 27, 2023Updated 2 years ago
- berg 🦀 Transform the contents of Epub documents.☆10Apr 27, 2023Updated 2 years ago
- VertMetric: An abstractive summarization evaluation package. VERT stands for Versatile Evaluation of Reduced Texts.☆11Dec 20, 2018Updated 7 years ago
- Proof-of-concept of playing ScummVM with Speech Recognition☆10Nov 30, 2021Updated 4 years ago
- ☆13Jan 29, 2026Updated last month
- Rust implementation of the Fift esoteric language☆12Aug 19, 2025Updated 6 months ago
- See https://github.com/cuda-mode/triton-index/ instead!☆11May 8, 2024Updated last year