AI4Bharat / FBI
FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists
☆27Updated 2 months ago
Alternatives and similar repositories for FBI:
Users that are interested in FBI are comparing it to the libraries listed below
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.☆25Updated last year
- IndicGenBench is a high-quality, multilingual, multi-way parallel benchmark for evaluating Large Language Models (LLMs) on 4 user-facing …☆43Updated 5 months ago
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated 11 months ago
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentati…☆37Updated 2 years ago
- Efficiently computing & storing token n-grams from large corpora☆18Updated 4 months ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆34Updated last year
- ☆44Updated 3 months ago
- Measuring and Controlling Persona Drift in Language Model Dialogs☆16Updated 11 months ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…☆25Updated 10 months ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆48Updated 7 months ago
- ☆24Updated last year
- Using short models to classify long texts☆21Updated last year
- ☆31Updated 8 months ago
- ☆57Updated 4 months ago
- Documentation effort for the BookCorpus dataset☆33Updated 3 years ago
- Supervised instruction finetuning for LLM with HF trainer and Deepspeed☆34Updated last year
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆24Updated 2 months ago
- GlotCC Dataset and Pipline -- NeurIPS 2024☆17Updated 3 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimization☆57Updated 11 months ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- ☆65Updated last year
- Code repository for "Introducing Airavata: Hindi Instruction-tuned LLM"☆55Updated 3 months ago
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆56Updated 8 months ago
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆14Updated last year
- Code for Zero-Shot Tokenizer Transfer☆120Updated last month
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub 🤗⚡️☆36Updated 2 years ago
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Updated last year
- Pre-train Static Word Embeddings☆47Updated 3 weeks ago
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆70Updated 11 months ago