☆35May 9, 2025Updated 9 months ago
Alternatives and similar repositories for SHADE-Arena
Users that are interested in SHADE-Arena are comparing it to the libraries listed below
Sorting:
- ControlArena is a collection of settings, model organisms and protocols - for running control experiments.☆158Updated this week
- Code to the paper: The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence☆26Jul 31, 2025Updated 7 months ago
- A collection of different ways to implement accessing and modifying internal model activations for LLMs☆20Oct 18, 2024Updated last year
- ☆27Oct 6, 2024Updated last year
- Align your LM to express calibrated verbal statements of confidence in its long-form generations.☆29Jun 4, 2024Updated last year
- Code repo for the model organisms and convergent directions of EM papers.☆53Sep 22, 2025Updated 5 months ago
- [EMNLP 2024] "Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective"☆32Jul 22, 2024Updated last year
- Vstream - Video Analytics pipeline with Hardware based accelerations (dev - stage)☆10Feb 2, 2024Updated 2 years ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆54Apr 4, 2025Updated 11 months ago
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆160May 29, 2025Updated 9 months ago
- ☆43Feb 9, 2026Updated 3 weeks ago
- Improving Alignment and Robustness with Circuit Breakers☆258Sep 24, 2024Updated last year
- ☆16Jan 16, 2025Updated last year
- Residual Quantization Autoencoder, used for interpreting LLMs☆14Jan 1, 2025Updated last year
- Official repository for "DYPLOC: Dynamic Planning of Content Using Mixed Language Models for Opinion Text Generation"☆10May 20, 2022Updated 3 years ago
- Precision Knowledge Editing (PKE): A novel method to reduce toxicity in LLMs while preserving performance, with robust evaluations and ha…☆11Nov 26, 2024Updated last year
- 2020湖南省第一届人工智能大赛参赛作品☆11Feb 17, 2022Updated 4 years ago
- yolo目标检测算法☆15Jul 27, 2025Updated 7 months ago
- ☆12Aug 15, 2023Updated 2 years ago
- An implementation of MSSRM method☆11Mar 23, 2023Updated 2 years ago
- A Multi-Session and Multi-Therapy Benchmark for High-Realism AI Psychological Counselor☆30Jan 13, 2026Updated last month
- Bias Correction of Learned Generative Models using Likelihood-Free Importance Weighting☆12Mar 24, 2023Updated 2 years ago
- Code for Augment & Reduce, a scalable stochastic algorithm for large categorical distributions☆10May 16, 2018Updated 7 years ago
- Dcraw native builds for Windows OS☆12Dec 29, 2017Updated 8 years ago
- Prompt-Guided Retrieval For Non-Knowledge-Intensive Tasks☆12Sep 1, 2023Updated 2 years ago
- ☆19May 14, 2025Updated 9 months ago
- The contrastive token loss function for reducing generative repetition of autoregressive neural language models.☆13May 11, 2022Updated 3 years ago
- The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1☆13Apr 23, 2025Updated 10 months ago
- ☆10Jun 3, 2019Updated 6 years ago
- 这是一个面向币圈新手的入门速通指南集合,包括最全面的币圈区块链资源集合,包含各类工具导航,快速了解币圈常用术语和行话,详细的防骗指南,助你规避各类风险☆19Feb 10, 2026Updated 3 weeks ago
- An implementation of Compositional Attention: Disentangling Search and Retrieval by MILA☆14Jun 1, 2022Updated 3 years ago
- Zen-NAS, a lightning fast, training-free Neural Architecture Searching algorithm☆11Nov 12, 2021Updated 4 years ago
- Implementation of "Semi-Supervised Crowd Counting with Contextual Modeling: Facilitating Holistic Understanding of Crowd Scenes"☆12Oct 2, 2024Updated last year
- ☆11Mar 22, 2024Updated last year
- Various object detection testing using YOLO and other algorithms, Raspberry pi based integration experiments.☆12Dec 9, 2024Updated last year
- [ECCV 2022] "TALISMAN: Targeted Active Learning for Object Detection with Rare Classes and Slices using Submodular Mutual Information" by…☆10Sep 21, 2022Updated 3 years ago
- Machine Learning Reading Group☆11Sep 15, 2023Updated 2 years ago
- ☆15Aug 19, 2024Updated last year
- Compilation of ML/AI Resources for Members of MITxHarvard Women in AI☆11Mar 28, 2022Updated 3 years ago