lisadunlap / VibeCheckLinks
Automated Qualitative Analysis of LLMs (ICLR 2025)
☆53Updated 6 months ago
Alternatives and similar repositories for VibeCheck
Users that are interested in VibeCheck are comparing it to the libraries listed below
Sorting:
- Leveraging Base Language Models for Few-Shot Synthetic Data Generation☆40Updated 3 months ago
- ☆59Updated last year
- ☆94Updated this week
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆66Updated last year
- ☆92Updated last month
- Source code for the collaborative reasoner research project at Meta FAIR.☆112Updated 9 months ago
- Aioli: A unified optimization framework for language model data mixing☆32Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Updated last year
- Codebase accompanying the Summary of a Haystack paper.☆80Updated last year
- The first dense retrieval model that can be prompted like an LM☆90Updated 8 months ago
- ☆49Updated 9 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 11 months ago
- ☆55Updated last year
- Verifiers for LLM Reinforcement Learning☆80Updated 9 months ago
- Learning to route instances for Human vs AI Feedback (ACL Main '25)☆26Updated 5 months ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆54Updated 9 months ago
- Functional Benchmarks and the Reasoning Gap☆89Updated last year
- An attribution library for LLMs☆46Updated last year
- When Reasoning Meets Its Laws☆34Updated 2 weeks ago
- Evaluating LLMs with fewer examples☆169Updated last year
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 9 months ago
- ☆53Updated 11 months ago
- Simple replication of [ColBERT-v1](https://arxiv.org/abs/2004.12832).☆82Updated last year
- Code for ExploreTom☆89Updated 6 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆91Updated last year
- This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.☆29Updated last year
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Updated last year
- Analysis code for Neurips 2025 paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"☆55Updated 5 months ago
- Sphynx Hallucination Induction☆52Updated 11 months ago
- Code for "Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs"☆86Updated 10 months ago