laude-institute / t-bench

A benchmark for LLMs on complicated tasks in the terminal

☆30

Alternatives and similar repositories for t-bench:

Users that are interested in t-bench are comparing it to the libraries listed below

pratyushmaini / llm_dataset_inference
Official Repository for Dataset Inference for LLMs
☆33Updated 9 months ago
mcleish7 / gemstone-scaling-laws
☆23Updated 2 months ago
ejones313 / auditing-llms
☆54Updated 2 years ago
MadryLab / datamodels-data
Data for "Datamodels: Predicting Predictions with Training Data"
☆97Updated last year
stanfordnlp / axbench
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
☆70Updated last month
milesaturpin / cot-unfaithfulness
☆42Updated last year
paul-rottger / xstest
Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"
☆95Updated 2 months ago
locuslab / acr-memorization
☆33Updated 4 months ago
centerforaisafety / wmdp
WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…
☆111Updated last year
MadryLab / DsDm
☆47Updated last year
jonhue / activeft
PyTorch library for Active Fine-Tuning
☆64Updated 2 months ago
tianyu139 / meaning-as-trajectories
Official PyTorch Implementation for Meaning Representations from Trajectories in Autoregressive Models (ICLR 2024)
☆20Updated 11 months ago
logix-project / logix
AI Logging for Interpretability and Explainability🔬
☆111Updated 10 months ago
MadryLab / datamodels
☆28Updated last year
peterljq / Parsimonious-Concept-Engineering
PaCE: Parsimonious Concept Engineering for Large Language Models (NeurIPS 2024)
☆35Updated 5 months ago
ahans30 / goldfish-loss
[NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs
☆84Updated 5 months ago
mlcommons / modelbench
Run safety benchmarks against AI models and view detailed reports showing how well they performed.
☆88Updated this week
snu-mllab / Neural-Relation-Graph
Official PyTorch implementation of "Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data" (NeurIPS'23)
☆15Updated last year
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆63Updated this week
hartvigsen-group / composable-interventions
☆28Updated 2 months ago
JonasGeiping / carving
Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives
☆68Updated last year
allenai / noncompliance
This repository contains data, code and models for contextual noncompliance.
☆21Updated 9 months ago
sail-sg / Cheating-LLM-Benchmarks
[ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)
☆77Updated 6 months ago
Nix07 / finetuning
This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…
☆25Updated last year
aengusl / latent-adversarial-training
☆35Updated 6 months ago
hadasah / btm
☆72Updated 11 months ago
XiangLi1999 / AutoBencher
☆27Updated 9 months ago
McGill-NLP / AdversarialTriggers
Code for "Universal Adversarial Triggers Are Not Universal."
☆17Updated 11 months ago
EleutherAI / elk-generalization
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆26Updated 11 months ago
yidingjiang / ado
The repository contains code for Adaptive Data Optimization
☆24Updated 4 months ago