lisadunlap / VibeCheckLinks
Automated Qualitative Analysis of LLMs (ICLR 2025)
☆40Updated last week
Alternatives and similar repositories for VibeCheck
Users that are interested in VibeCheck are comparing it to the libraries listed below
Sorting:
- ☆57Updated 9 months ago
- ☆9Updated last year
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆58Updated 7 months ago
- This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.☆25Updated 7 months ago
- ☆20Updated 4 months ago
- ☆50Updated 2 weeks ago
- Combining Base and Instruction-Tuned Language Models for Better Synthetic Data Generation☆33Updated 5 months ago
- The first dense retrieval model that can be prompted like an LM☆80Updated 2 months ago
- Codebase accompanying the Summary of a Haystack paper.☆79Updated 9 months ago
- ☆48Updated 5 months ago
- ☆45Updated 3 months ago
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆27Updated last month
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated 10 months ago
- Official Repo for CRMArena and CRMArena-Pro☆99Updated 3 weeks ago
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Updated 11 months ago
- ☆63Updated last year
- Aioli: A unified optimization framework for language model data mixing☆27Updated 5 months ago
- Testing paligemma2 finetuning on reasoning dataset☆18Updated 6 months ago
- Unstract's interface to LLMs, Embeddings and VectorDBs.☆18Updated 11 months ago
- ☆22Updated last month
- ☆48Updated last year
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆23Updated 3 months ago
- Accompanying code and SEP dataset for the "Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?" paper.☆54Updated 4 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆94Updated 2 months ago
- Verifiers for LLM Reinforcement Learning☆64Updated 3 months ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆12Updated 3 weeks ago
- ☆69Updated last month
- ☆16Updated 6 months ago
- ☆43Updated 8 months ago
- Backtracing: Retrieving the Cause of the Query, EACL 2024 Long Paper, Findings.☆89Updated 11 months ago