AI4Bharat / FBILinks

FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists

☆28

Alternatives and similar repositories for FBI

Users that are interested in FBI are comparing it to the libraries listed below

Sorting:

bminixhofer / tokenkit
A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.
☆27Updated last month
google-research-datasets / indic-gen-bench
IndicGenBench is a high-quality, multilingual, multi-way parallel benchmark for evaluating Large Language Models (LLMs) on 4 user-facing …
☆49Updated 9 months ago
EleutherAI / semantic-memorization
☆44Updated 7 months ago
mungg / FABLES
☆57Updated 9 months ago
yash-srivastava19 / arrakis
Arrakis is a library to conduct, track and visualize mechanistic interpretability experiments.
☆29Updated 2 months ago
benpry / chain-of-thought-metaphor
This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…
☆14Updated 2 years ago
okarthikb / attention-visualizer
LLM attention pattern visualizer
☆10Updated last year
cisnlp / GlotCC
🕸 GlotCC Dataset and Pipline -- NeurIPS 2024
☆19Updated 2 months ago
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆57Updated 9 months ago
CarperAI / decontamination
This repository contains code for cleaning your training data of benchmark data to help combat data snooping.
☆25Updated 2 years ago
salesforce / summary-of-a-haystack
Codebase accompanying the Summary of a Haystack paper.
☆78Updated 9 months ago
HazyResearch / aioli
Aioli: A unified optimization framework for language model data mixing
☆27Updated 5 months ago
choosewhatulike / case2code
☆15Updated 2 months ago
Knowledgator / FlashDeBERTa
Trully flash implementation of DeBERTa disentangled attention mechanism.
☆59Updated last month
TristanThrush / i-am-a-strange-dataset
Repository for "I am a Strange Dataset: Metalinguistic Tests for Language Models"
☆44Updated last year
allenai / bff
☆38Updated last year
petezh / OpenD5
Tasks for describing differences between text distributions.
☆16Updated 10 months ago
r-three / RAD
Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
☆43Updated last year
liujch1998 / infini-gram
☆52Updated 3 weeks ago
srush / LLM-Talk
☆51Updated last year
LauraRuis / do-pigs-fly
☆19Updated last year
allenai / super-benchmark
☆45Updated 2 months ago
ahans30 / goldfish-loss
[NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs
☆89Updated 7 months ago
austrian-code-wizard / c3po
☆27Updated this week
Zyphra / Zyda_processing
☆35Updated last year
akjindal53244 / Arithmo
Small and Efficient Mathematical Reasoning LLMs
☆71Updated last year
sfeucht / footprints
https://footprints.baulab.info
☆17Updated 8 months ago
EleutherAI / tokengrams
Efficiently computing & storing token n-grams from large corpora
☆24Updated 8 months ago
awslabs / rag-qa-arena
☆45Updated 10 months ago
allenai / infinigram-api
☆61Updated 3 weeks ago