lisadunlap / VibeCheck
Automated Qualitative Analysis of LLMs (ICLR 2025)
☆35Updated 2 weeks ago
Alternatives and similar repositories for VibeCheck:
Users that are interested in VibeCheck are comparing it to the libraries listed below
- Programmable automated machine learning - proof of concept☆14Updated 6 months ago
- ☆9Updated last year
- Improving Your Model Ranking on Chatbot Arena by Vote Rigging☆20Updated last month
- Efficient Dictionary Learning with Switch Sparse Autoencoders (SAEs)☆21Updated 4 months ago
- [EMNLP 2024 Findings] ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs☆24Updated 6 months ago
- [NeurIPS XAIA & Springer] Code and notebooks to paper "A Fresh Look at Sanity Checks for Saliency Maps"☆25Updated 9 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆54Updated 4 months ago
- Symmetric Encryption with Language Models☆12Updated last year
- ☆13Updated 4 months ago
- Non-Pydantic, Non-JSON Schema, efficient AutoPrompting and Structured Output Library☆28Updated last month
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated 10 months ago
- Create an LLM XML context document from an llms.txt file☆17Updated 7 months ago
- The repository for papaer "Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs"☆12Updated 4 months ago
- Wonderful Matrices to Build Small Language Models☆44Updated 2 months ago
- Unstract's interface to LLMs, Embeddings and VectorDBs.☆18Updated 8 months ago
- gzip Predicts Data-dependent Scaling Laws☆34Updated 10 months ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆62Updated last month
- This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.☆24Updated 5 months ago
- ☆27Updated 7 months ago
- An attribution library for LLMs☆38Updated 7 months ago
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆13Updated 4 months ago
- Efficiently computing & storing token n-grams from large corpora☆22Updated 6 months ago
- Using modal.com to process FineWeb-edu data☆20Updated 2 weeks ago
- Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claud…☆26Updated last month
- OpenPipe ART (Agent Reinforcement Trainer): train LLM agents☆108Updated this week
- ☆31Updated 5 months ago
- The first dense retrieval model that can be prompted like an LM☆70Updated 7 months ago
- ☆17Updated 6 months ago
- Repository containing awesome resources regarding Hugging Face tooling.☆46Updated last year
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Updated 8 months ago