i-gao / model-equality-testingLinks
Test equality between a black-box LLM API and a reference distribution
☆11Updated 10 months ago
Alternatives and similar repositories for model-equality-testing
Users that are interested in model-equality-testing are comparing it to the libraries listed below
Sorting:
- ☆47Updated last year
- ☆57Updated 2 years ago
- Steering Llama 2 with Contrastive Activation Addition☆176Updated last year
- [ICLR 2025] Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"☆60Updated 2 months ago
- Improving Alignment and Robustness with Circuit Breakers☆228Updated 11 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆121Updated 6 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆124Updated 2 months ago
- ☆185Updated last month
- [ICLR 2025] General-purpose activation steering library☆95Updated last month
- Improving Steering Vectors by Targeting Sparse Autoencoder Features☆24Updated 9 months ago
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆135Updated 3 months ago
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆110Updated 6 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆207Updated this week
- ☆165Updated 9 months ago
- Using sparse coding to find distributed representations used by neural networks.☆265Updated last year
- Open source replication of Anthropic's Crosscoders for Model Diffing☆59Updated 10 months ago
- ☆73Updated 3 months ago