lisadunlap / VibeCheckLinks
Automated Qualitative Analysis of LLMs (ICLR 2025)
☆46Updated 2 months ago
Alternatives and similar repositories for VibeCheck
Users that are interested in VibeCheck are comparing it to the libraries listed below
Sorting:
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆62Updated 9 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆33Updated 5 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 7 months ago
- Aioli: A unified optimization framework for language model data mixing☆27Updated 8 months ago
- ☆57Updated last year
- ☆81Updated this week
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
- Verifiers for LLM Reinforcement Learning☆74Updated 5 months ago
- Leveraging Base Language Models for Few-Shot Synthetic Data Generation☆34Updated last month
- Evaluating LLMs with fewer examples☆161Updated last year
- Source code for the collaborative reasoner research project at Meta FAIR.☆103Updated 5 months ago
- ☆24Updated 4 months ago
- Functional Benchmarks and the Reasoning Gap☆89Updated 11 months ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated last year
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆88Updated last year
- An attribution library for LLMs☆42Updated last year
- ☆47Updated 5 months ago
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆108Updated 11 months ago
- ☆49Updated 7 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆151Updated 7 months ago
- The first dense retrieval model that can be prompted like an LM☆89Updated 4 months ago
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆168Updated last week
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆75Updated 9 months ago
- Learning to route instances for Human vs AI Feedback (ACL Main '25)☆24Updated 2 months ago
- ☆22Updated 6 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆55Updated 6 months ago
- ☆39Updated 11 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- Code for ExploreTom☆86Updated 3 months ago
- ☆67Updated 5 months ago