lisadunlap / VibeCheck
Automated Qualitative Analysis of LLMs
☆34Updated this week
Alternatives and similar repositories for VibeCheck:
Users that are interested in VibeCheck are comparing it to the libraries listed below
- ☆9Updated 11 months ago
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆24Updated 5 months ago
- Programmable automated machine learning - proof of concept☆14Updated 5 months ago
- ☆19Updated 2 weeks ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆52Updated 3 months ago
- A visual tool to interpret and understand PyTorch machine learning models☆16Updated last year
- This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.☆23Updated 4 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆33Updated last year
- ☆15Updated 5 months ago
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated 9 months ago
- [NeurIPS XAIA & Springer] Code and notebooks to paper "A Fresh Look at Sanity Checks for Saliency Maps"☆25Updated 8 months ago
- ☆27Updated 6 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆50Updated last week
- Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Ref…☆46Updated 3 weeks ago
- ☆13Updated 2 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 6 months ago
- Aioli: A unified optimization framework for language model data mixing☆22Updated 2 months ago
- Professional Wargaming LLM Toolbox☆11Updated 5 months ago
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆14Updated 7 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆36Updated last year
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆83Updated 4 months ago