lisadunlap / VibeCheck
Automated Qualitative Analysis of LLMs
☆33Updated this week
Alternatives and similar repositories for VibeCheck:
Users that are interested in VibeCheck are comparing it to the libraries listed below
- ☆9Updated 10 months ago
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆24Updated 3 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated last year
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆48Updated 2 months ago
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆106Updated this week
- ☆14Updated 4 months ago
- Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Ref…☆43Updated last week
- ☆39Updated 6 months ago
- Framework and toolkits for building and evaluating collaborative agents that can work together with humans.☆49Updated this week
- ☆27Updated 3 months ago
- ☆13Updated 2 months ago
- A visual tool to interpret and understand PyTorch machine learning models☆16Updated last year
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆12Updated 3 months ago
- ☆50Updated 2 months ago
- [NeurIPS XAIA & Springer] Code and notebooks to paper "A Fresh Look at Sanity Checks for Saliency Maps"☆25Updated 7 months ago
- This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.☆23Updated 3 months ago
- A testbed for agents and environments that can automatically improve models through data generation.☆18Updated 2 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆54Updated 5 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated 11 months ago
- Code for our paper PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles☆21Updated last month
- The first dense retrieval model that can be prompted like an LM☆64Updated 5 months ago
- Efficient Dictionary Learning with Switch Sparse Autoencoders (SAEs)☆20Updated 2 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆31Updated last year
- ☆20Updated last week
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated 8 months ago
- Codebase accompanying the Summary of a Haystack paper.☆74Updated 5 months ago
- Aioli: A unified optimization framework for language model data mixing☆20Updated last month
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆46Updated 8 months ago
- ☆57Updated 7 months ago