lisadunlap / VibeCheckLinks
Automated Qualitative Analysis of LLMs (ICLR 2025)
☆35Updated last month
Alternatives and similar repositories for VibeCheck
Users that are interested in VibeCheck are comparing it to the libraries listed below
Sorting:
- This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.☆24Updated 6 months ago
- Aioli: A unified optimization framework for language model data mixing☆25Updated 4 months ago
- ☆19Updated last week
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆25Updated last week
- ☆50Updated this week
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆56Updated 5 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- ☆21Updated 3 months ago
- ☆38Updated last week
- ☆42Updated 2 months ago
- ☆57Updated 8 months ago
- Verifiers for LLM Reinforcement Learning☆55Updated last month
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆34Updated last year
- ☆9Updated last year
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated last year
- ☆65Updated 2 months ago
- ☆16Updated 5 months ago
- ☆32Updated 4 months ago
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆13Updated 5 months ago
- ☆13Updated 5 months ago
- The first dense retrieval model that can be prompted like an LM☆73Updated 3 weeks ago
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆24Updated last month
- [arXiv] EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees☆19Updated 2 months ago
- Improving Your Model Ranking on Chatbot Arena by Vote Rigging (ICML 2025)☆21Updated 3 months ago
- Make reasoning models scalable☆32Updated this week
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆87Updated 6 months ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆12Updated 7 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆32Updated last month
- ☆41Updated 10 months ago
- Agent Skill Induction: "Inducing Programmatic Skills for Agentic Tasks"☆20Updated last month