☆43May 9, 2025Updated 11 months ago
Alternatives and similar repositories for SHADE-Arena
Users that are interested in SHADE-Arena are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆24Jun 22, 2025Updated 9 months ago
- ControlArena is a collection of settings, model organisms and protocols - for running control experiments.☆177Updated this week
- ☆25Aug 23, 2025Updated 7 months ago
- ☆14Jun 11, 2025Updated 10 months ago
- Röttger et al. (2025): "MSTS: A Multimodal Safety Test Suite for Vision-Language Models"☆16Mar 31, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Align your LM to express calibrated verbal statements of confidence in its long-form generations.☆29Jun 4, 2024Updated last year
- A collection of different ways to implement accessing and modifying internal model activations for LLMs☆22Oct 18, 2024Updated last year
- ☆45Nov 3, 2025Updated 5 months ago
- Code repo for the model organisms and convergent directions of EM papers.☆59Sep 22, 2025Updated 6 months ago
- Clover: Closed-Loop Verifiable Code Generation☆45May 12, 2025Updated 11 months ago
- ☆27Oct 6, 2024Updated last year
- ICLR 2026: Agent-X Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆38Apr 5, 2026Updated last week
- Code for our NAACL2025 accepted paper: Attention Tracker: Detecting Prompt Injection Attacks in LLMs☆23Sep 19, 2025Updated 6 months ago
- A hackable, simple, and reseach-friendly GRPO Training Framework with high speed weight synchronization in a multinode environment.☆37Aug 27, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Improving Alignment and Robustness with Circuit Breakers☆260Sep 24, 2024Updated last year
- Experiments with representation engineering☆14Feb 28, 2024Updated 2 years ago
- ☆52Nov 19, 2025Updated 4 months ago
- ☆119Feb 11, 2025Updated last year
- [EMNLP 2024] "Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective"☆32Jul 22, 2024Updated last year
- ☆11Oct 16, 2023Updated 2 years ago
- A self-updating GitHub profile 🐯☆15Updated this week
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆180Mar 12, 2026Updated last month
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆164May 29, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Open Source Replication of Anthropic's Alignment Faking Paper☆56Apr 4, 2025Updated last year
- TextComplexityDE dataset consists of 1000 sentences in the German language with subjective complexity rating, collected from German learn…☆13Apr 8, 2022Updated 4 years ago
- Code for ICML 2023 paper "When and How Does Known Class Help Discover Unknown Ones? Provable Understandings Through Spectral Analysis"☆14Jun 24, 2023Updated 2 years ago
- CyclesGym: an OpenAI gym interface to the Cycles agricultural simulator☆16Aug 10, 2022Updated 3 years ago
- ☆20Nov 15, 2024Updated last year
- Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.☆25Jan 26, 2024Updated 2 years ago
- Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts☆16Feb 26, 2024Updated 2 years ago
- DafnyBench: A Benchmark for Formal Software Verification☆61Dec 12, 2024Updated last year
- CropGPT☆21Nov 2, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Codebase for Linguistic Collapse: Neural Collapse in (Large) Language Models [NeurIPS 2024] [arXiv:2405.17767]☆18Apr 14, 2025Updated last year
- A lightweight library for large laguage model (LLM) jailbreaking defense.☆61Sep 11, 2025Updated 7 months ago
- ☆40Feb 11, 2025Updated last year
- ☆18Mar 30, 2025Updated last year
- Code accompanying the paper "A Language Model's Guide Through Latent Space". It contains functionality for training and using concept vec…☆21Feb 23, 2024Updated 2 years ago
- Code Release for the 2023 NeurIPS Paper How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained langua…☆17Dec 6, 2024Updated last year
- Code to train and test Word Sense Disambiguation models based on different pretrained transformers.☆15Dec 21, 2021Updated 4 years ago