laude-institute / t-bench
A benchmark for LLMs on complicated tasks in the terminal
☆30Updated this week
Alternatives and similar repositories for t-bench:
Users that are interested in t-bench are comparing it to the libraries listed below
- Official Repository for Dataset Inference for LLMs☆33Updated 9 months ago
- ☆23Updated 2 months ago
- ☆54Updated 2 years ago
- Data for "Datamodels: Predicting Predictions with Training Data"☆97Updated last year
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆70Updated last month
- ☆42Updated last year
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆95Updated 2 months ago
- ☆33Updated 4 months ago
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆111Updated last year
- ☆47Updated last year
- PyTorch library for Active Fine-Tuning☆64Updated 2 months ago
- Official PyTorch Implementation for Meaning Representations from Trajectories in Autoregressive Models (ICLR 2024)☆20Updated 11 months ago
- AI Logging for Interpretability and Explainability🔬☆111Updated 10 months ago
- ☆28Updated last year
- PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)☆35Updated 5 months ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆84Updated 5 months ago
- Run safety benchmarks against AI models and view detailed reports showing how well they performed.☆88Updated this week
- Official PyTorch implementation of "Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data" (NeurIPS'23)☆15Updated last year
- A library for efficient patching and automatic circuit discovery.☆63Updated this week
- ☆28Updated 2 months ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆68Updated last year
- This repository contains data, code and models for contextual noncompliance.☆21Updated 9 months ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆77Updated 6 months ago
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…☆25Updated last year
- ☆35Updated 6 months ago
- ☆72Updated 11 months ago
- ☆27Updated 9 months ago
- Code for "Universal Adversarial Triggers Are Not Universal."☆17Updated 11 months ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆26Updated 11 months ago
- The repository contains code for Adaptive Data Optimization☆24Updated 4 months ago