chziakas / redevalLinks
A library for red-teaming LLM applications with LLMs.
☆27Updated 9 months ago
Alternatives and similar repositories for redeval
Users that are interested in redeval are comparing it to the libraries listed below
Sorting:
- Red-Teaming Language Models with DSPy☆202Updated 5 months ago
- A prompt injection game to collect data for robust ML research☆62Updated 5 months ago
- A subset of jailbreaks automatically discovered by the Haize Labs haizing suite.☆92Updated 3 months ago
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆17Updated 10 months ago
- Sphynx Hallucination Induction☆53Updated 5 months ago
- Code for the paper "Fishing for Magikarp"☆159Updated 2 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆106Updated 7 months ago
- ☆34Updated 8 months ago
- Efficient Dictionary Learning with Switch Sparse Autoencoders (SAEs)☆25Updated 7 months ago
- ☆70Updated this week
- This project investigates the security of large language models by performing binary classification of a set of input prompts to discover…☆42Updated last year
- ☆19Updated last year
- The official repository of the paper "On the Exploitability of Instruction Tuning".☆64Updated last year
- Test LLMs against jailbreaks and unprecedented harms☆32Updated 9 months ago
- ☆121Updated last month
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆48Updated 3 months ago
- Official repository for the paper "ALERT: A Comprehensive Benchmark for Assessing Large Language Models’ Safety through Red Teaming"☆43Updated 10 months ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆109Updated last year
- PyTorch library for Active Fine-Tuning☆87Updated 5 months ago
- CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.☆48Updated 8 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 5 months ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆90Updated 8 months ago
- ☆45Updated 3 months ago
- Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"☆128Updated 11 months ago
- ☆75Updated 8 months ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆116Updated last year
- Measuring the situational awareness of language models☆37Updated last year
- ☆24Updated 8 months ago
- Codebase for Obfuscated Activations Bypass LLM Latent-Space Defenses☆21Updated 5 months ago
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆69Updated last year