AI4Bharat / FBI
FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists
☆21Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for FBI
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆33Updated last year
- Code for Zero-Shot Tokenizer Transfer☆115Updated 2 weeks ago
- IndicGenBench is a high-quality, multilingual, multi-way parallel benchmark for evaluating Large Language Models (LLMs) on 4 user-facing …☆41Updated 2 months ago
- Code repository for "Introducing Airavata: Hindi Instruction-tuned LLM"☆54Updated 2 weeks ago
- ☆46Updated last month
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆34Updated last year
- code for training & evaluating Contextual Document Embedding models☆92Updated this week
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆22Updated 7 months ago
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.☆25Updated last year
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 3 months ago
- Codebase accompanying the Summary of a Haystack paper.☆71Updated last month
- Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."☆61Updated last year
- Using open source LLMs to build synthetic datasets for direct preference optimization☆40Updated 8 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆38Updated 2 weeks ago
- ☆24Updated last year
- ☆32Updated last year
- Retrieval Augmented Generation Generalized Evaluation Dataset☆51Updated last month
- Repository for "I am a Strange Dataset: Metalinguistic Tests for Language Models"☆39Updated 9 months ago
- Code for NeurIPS LLM Efficiency Challenge☆53Updated 7 months ago
- ☆46Updated 9 months ago
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.☆48Updated this week
- This is the official repository for Inheritune.☆105Updated last month
- ☆38Updated this week
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023☆65Updated 8 months ago
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).☆76Updated 7 months ago
- The first dense retrieval model that can be prompted like an LM☆62Updated last month
- ☆65Updated last year
- ☆43Updated last month
- Explore the use of DSPy for extracting features from PDFs 🔎☆32Updated 8 months ago