☆41May 9, 2025Updated 10 months ago
Alternatives and similar repositories for SHADE-Arena
Users that are interested in SHADE-Arena are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆23Jun 22, 2025Updated 9 months ago
- ControlArena is a collection of settings, model organisms and protocols - for running control experiments.☆164Updated this week
- A repository that holds templates, examples, and tests to help external parties submit tasks to AISI that conform with the Autonomous Sys…☆11Jan 16, 2026Updated 2 months ago
- ☆28Sep 23, 2025Updated 6 months ago
- Röttger et al. (2025): "MSTS: A Multimodal Safety Test Suite for Vision-Language Models"☆16Mar 31, 2025Updated 11 months ago
- ☆11Jun 5, 2024Updated last year
- Align your LM to express calibrated verbal statements of confidence in its long-form generations.☆29Jun 4, 2024Updated last year
- A collection of different ways to implement accessing and modifying internal model activations for LLMs☆20Oct 18, 2024Updated last year
- Bias Benchmark for Natural Language Inference. Code repo for the Findings of NAACL 2022 paper "On Measuring Social Biases in Prompt-Based…☆15Apr 28, 2022Updated 3 years ago
- A graph structure analysis and visualization tool for the are.na platform.☆12Sep 3, 2017Updated 8 years ago
- ☆39Nov 3, 2025Updated 4 months ago
- Plugin for LLM-CLI adding support for Together.AI hosting a large collection of open-source LLMs☆18Apr 10, 2024Updated last year
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆31Jun 5, 2025Updated 9 months ago
- d3 force graph for are.na channels☆19Jun 12, 2024Updated last year
- Resources for economic research on data privacy☆14Jul 4, 2019Updated 6 years ago
- MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementing…☆10Oct 7, 2024Updated last year
- Code repo for the model organisms and convergent directions of EM papers.☆57Sep 22, 2025Updated 6 months ago
- A tool for visualization of complex job searches.☆13Jul 8, 2022Updated 3 years ago
- [ICLR 2025] Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron☆30Apr 30, 2025Updated 10 months ago
- [ICLR 2025] General-purpose activation steering library☆150Sep 18, 2025Updated 6 months ago
- ☆27Oct 6, 2024Updated last year
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆38Nov 27, 2025Updated 3 months ago
- Improving Alignment and Robustness with Circuit Breakers☆258Sep 24, 2024Updated last year
- Simple Python utility to convert hand model from GraspIt! to a ROS-compatible URDF.☆14Dec 19, 2018Updated 7 years ago
- Experiments with representation engineering☆14Feb 28, 2024Updated 2 years ago
- [EMNLP 2024] "Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective"☆33Jul 22, 2024Updated last year
- ☆11Oct 16, 2023Updated 2 years ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆175Mar 12, 2026Updated last week
- A self-updating GitHub profile 🐯☆15Updated this week
- ☆24Aug 23, 2025Updated 7 months ago
- A framework making it effortless to convert any llm model into a reasoning agent like o1 or DeepSeek's r1☆24Oct 13, 2025Updated 5 months ago
- Code to the paper: The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence☆25Jul 31, 2025Updated 7 months ago
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆162May 29, 2025Updated 9 months ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆54Apr 4, 2025Updated 11 months ago
- Code for ICML 2023 paper "When and How Does Known Class Help Discover Unknown Ones? Provable Understandings Through Spectral Analysis"☆14Jun 24, 2023Updated 2 years ago
- TextComplexityDE dataset consists of 1000 sentences in the German language with subjective complexity rating, collected from German learn…☆13Apr 8, 2022Updated 3 years ago
- SVIP: Towards Verifiable Inference of Open-Source Large Language Models☆14Jun 3, 2025Updated 9 months ago
- ☆20Nov 15, 2024Updated last year
- Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.☆25Jan 26, 2024Updated 2 years ago