The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1
☆13Apr 23, 2025Updated 11 months ago
Alternatives and similar repositories for Attacker
Users that are interested in Attacker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆19May 14, 2025Updated 10 months ago
- Generalized Optimal Transport Attention with Trainable Priors☆26Jan 25, 2026Updated last month
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- [ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions☆14Mar 7, 2026Updated 2 weeks ago
- Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning☆22Jul 8, 2024Updated last year
- Welcome to the official repository for Siren, a project aimed at understanding and mitigating harmful behaviors in large language models …☆15Sep 12, 2025Updated 6 months ago
- [ACL 2024] Benchmarking Knowledge Boundary for Large Language Models: A Different Perspective on Model Evaluation☆10May 26, 2024Updated last year
- AISafetyLab: A comprehensive framework covering safety attack, defense, evaluation and paper list.☆235Aug 29, 2025Updated 6 months ago
- an universal pytorch deep learning experiment codebase☆11Mar 31, 2025Updated 11 months ago
- ☆16Jun 25, 2025Updated 8 months ago
- Official Implementation for "Purifying Quantization-conditioned Backdoors via Layer-wise Activation Correction with Distribution Approxim…☆12Aug 14, 2024Updated last year
- [EMNLP 2025] The code repo of paper "X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Com…☆40Nov 24, 2025Updated 3 months ago
- Data and codes for EMNLP 2022 paper "CDConv: A Benchmark for Contradiction Detection in Chinese Conversations"☆13May 8, 2023Updated 2 years ago
- The official implementation of the paper "Free Fine-tuning: A Plug-and-Play Watermarking Scheme for Deep Neural Networks".☆19Apr 19, 2024Updated last year
- [ICML 2025] Official repository for paper "OR-Bench: An Over-Refusal Benchmark for Large Language Models"☆25Mar 4, 2025Updated last year
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆31Oct 9, 2025Updated 5 months ago
- ☆13Nov 7, 2023Updated 2 years ago
- [ICLR 2025] Official implementation for "SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanati…☆43Feb 11, 2025Updated last year
- Implementation of many popular AI algorithms to play the game of Pacman such as Minimax, Expectimax and Greedy.☆13Apr 12, 2020Updated 5 years ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆53Jun 24, 2024Updated last year
- ☆24Oct 14, 2024Updated last year
- 提供多款 Shadowrocket 规则,带广告过滤功能。用于 iOS 未越狱设备选择性地自动翻墙。shadowrocket 配置:https://raw.githubusercontent.com/lhie1/Rules/master/Shadowrocket.conf…☆14Oct 12, 2019Updated 6 years ago
- [NeurIPS 2025] Reasoning Models Better Express Their Confidence"☆22Nov 19, 2025Updated 4 months ago
- Research on "Many-Shot Jailbreaking" in Large Language Models (LLMs). It unveils a novel technique capable of bypassing the safety mechan…☆16Aug 6, 2024Updated last year
- Robust Point Cloud Processing through Positional Embedding☆13Sep 7, 2023Updated 2 years ago
- Code for ICCV 2023 work "Generalized Few-Shot Point Cloud Segmentation Via Geometric Words"☆12Sep 26, 2023Updated 2 years ago
- Code of EMNLP 2025 paper 'UltraIF: Advancing Instruction Following from the Wild'.☆21Apr 3, 2025Updated 11 months ago
- Project for a Computer Security class based on CSAW capture the flag challenges☆13Mar 19, 2014Updated 12 years ago
- Official Code for ART: Automatic Red-teaming for Text-to-Image Models to Protect Benign Users (NeurIPS 2024)☆23Oct 23, 2024Updated last year
- The code of paper "Toward Optimal LLM Alignments Using Two-Player Games".☆17Jun 20, 2024Updated last year
- ☆34Nov 24, 2020Updated 5 years ago
- Pytorch implementation and comparison of Fourier Feature Networks and Sinusoidal Representation Networks☆13Jun 27, 2020Updated 5 years ago
- ☆22Apr 23, 2024Updated last year
- ☆15Aug 1, 2023Updated 2 years ago
- Vstream - Video Analytics pipeline with Hardware based accelerations (dev - stage)☆10Feb 2, 2024Updated 2 years ago
- ☆15Nov 26, 2020Updated 5 years ago
- Precision Knowledge Editing (PKE): A novel method to reduce toxicity in LLMs while preserving performance, with robust evaluations and ha…☆11Nov 26, 2024Updated last year
- The multi-view version of MonoDETR on nuScenes dataset☆21Nov 4, 2022Updated 3 years ago
- An implementation of MSSRM method☆11Mar 23, 2023Updated 3 years ago