CentreSecuriteIA / BELLS
Benchmarks for the Evaluation of LLM Supervision
☆30Updated this week
Alternatives and similar repositories for BELLS:
Users that are interested in BELLS are comparing it to the libraries listed below
- METR Task Standard☆135Updated 2 weeks ago
- Collection of evals for Inspect AI☆47Updated this week
- ☆79Updated last week
- Inspect: A framework for large language model evaluations☆724Updated this week
- An open-source compliance-centered evaluation framework for Generative AI models☆121Updated last month
- Mechanistic Interpretability Visualizations using React☆220Updated last month
- This repository collects all relevant resources about interpretability in LLMs☆305Updated 2 months ago
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆74Updated this week
- 📚 A curated list of papers & technical articles on AI Quality & Safety☆166Updated last year
- The Foundation Model Transparency Index☆73Updated 7 months ago
- Machine Learning for Alignment Bootcamp☆25Updated 10 months ago
- Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆206Updated 11 months ago
- ☆67Updated last year
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆93Updated 10 months ago
- Sparse Autoencoder for Mechanistic Interpretability☆209Updated 6 months ago
- ☆247Updated 6 months ago
- A Comprehensive Assessment of Trustworthiness in GPT Models☆266Updated 4 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆78Updated last month
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models☆488Updated 6 months ago
- ☆206Updated last week
- Stanford CRFM's initiative to assess potential compliance with the draft EU AI Act☆93Updated last year
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆176Updated last month
- Automatically evaluate your LLMs in Google Colab☆579Updated 8 months ago
- ☆12Updated this week
- A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.☆106Updated this week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆973Updated this week
- Fiddler Auditor is a tool to evaluate language models.☆174Updated 10 months ago
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆392Updated 5 months ago
- ☆10Updated 6 months ago
- Improving Alignment and Robustness with Circuit Breakers☆175Updated 3 months ago