XuanChen-xc / RLbreakerView external linksLinks
Code for "When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search" (NeurIPS 2024)
☆17Oct 22, 2024Updated last year
Alternatives and similar repositories for RLbreaker
Users that are interested in RLbreaker are comparing it to the libraries listed below
Sorting:
- The official repository for guided jailbreak benchmark☆28Jul 28, 2025Updated 6 months ago
- ☆35May 21, 2025Updated 8 months ago
- Code for Findings-EMNLP 2023 paper: Multi-step Jailbreaking Privacy Attacks on ChatGPT☆35Oct 15, 2023Updated 2 years ago
- Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch☆10Aug 7, 2024Updated last year
- Code for experiments on self-prediction as a way to measure introspection in LLMs☆16Dec 10, 2024Updated last year
- [EMNLP 2025 Oral] IPIGuard: A Novel Tool Dependency Graph-Based Defense Against Indirect Prompt Injection in LLM Agents☆16Sep 16, 2025Updated 5 months ago
- my profile readme☆14Updated this week
- This is the official code repository for the paper: Towards General Continuous Memory for Vision-Language Models.☆19Jul 3, 2025Updated 7 months ago
- General AI evaluation and Gauge Engine. A unified evaluation engine for LLMs, MLLMs, audio, and diffusion models.☆40Updated this week
- Reference implementation of Thin and Deep Gaussian Processes (NeurIPS 2023)☆13Nov 25, 2024Updated last year
- ☆12Jul 8, 2024Updated last year
- Code Implementation of Adversarial Prompt Evaluation paper☆13Sep 18, 2025Updated 4 months ago
- Official Code for What Makes and Breaks Safety Fine-tuning? A Mechanistic Study (NeurIPS 2024)☆12Oct 31, 2024Updated last year
- [ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions☆14Sep 27, 2025Updated 4 months ago
- Predicting the Stock Market - Can we do it?☆10Jul 24, 2021Updated 4 years ago
- This approach of Intrusion Detection uses two GPT models, which are trained on normal network traffic, to predict sequences of communicat…☆11Oct 3, 2023Updated 2 years ago
- ☆14May 21, 2024Updated last year
- ☆10Dec 18, 2024Updated last year
- 🎮 A configurable Breakout environment for reinforcement learning☆11Mar 20, 2018Updated 7 years ago
- Implementation of the paper "Improving the Accuracy-Robustness Trade-off of Classifiers via Adaptive Smoothing".☆10Feb 6, 2024Updated 2 years ago
- [IJCAI'25 Workshop Oral] The 1st place solution of IJCAI 2025 challenge track 1: Image Detection and Localization☆33Dec 4, 2025Updated 2 months ago
- A Benchmark for Multi-Stage Legal Case Documents Generation☆14Feb 24, 2025Updated 11 months ago
- SEU Summer School project, based on Kotlin and Java.☆13Sep 15, 2023Updated 2 years ago
- Python package for compressing floating-point PyTorch tensors☆13Jul 22, 2024Updated last year
- Minimal Transformer base in JAX. A single backbone for language modelling, diffusion, classification, etc...☆14May 28, 2025Updated 8 months ago
- Learning to Skip the Middle Layers of Transformers☆17Aug 7, 2025Updated 6 months ago
- A Java-based framework for combinatorial test input generation, fault characterization and automated test execution.☆11Jan 22, 2024Updated 2 years ago
- Unofficial implementation of "Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle"☆13Jul 3, 2024Updated last year
- ☆13Aug 7, 2024Updated last year
- 🤖 Implementation of Self Normalizing Networks (SNN) in PyTorch.☆12Jun 19, 2017Updated 8 years ago
- This is a Pytorch Implementation of the DASP algorithm from the paper "Explaining Deep Neural Networks with a Polynomial Time Algorithm f…☆11Jun 12, 2020Updated 5 years ago
- 1st Place Team Crane: @aswinkumar1999 @rathull @kyolebu☆29Sep 8, 2025Updated 5 months ago
- The course work repo for UoSurrey EEEM071 (2023 Spring)☆11May 9, 2023Updated 2 years ago
- This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)☆49Jan 15, 2026Updated last month
- ☆53May 24, 2023Updated 2 years ago
- Multi-encoder segmentation for contrail detection in satellite imagery | Google Researc☆11Jan 28, 2026Updated 2 weeks ago
- 智能大幅加速南大LMS智慧教育平台课程进度/ 验证码自动识别/ 一键下载所有课件☆37Jan 8, 2026Updated last month
- Materials for "Multi-property Steering of Large Language Models with Dynamic Activation Composition"☆14Nov 22, 2024Updated last year
- Pytorch implementation of gradCAM, guidedBackProp, smoothGrad☆13Mar 5, 2019Updated 6 years ago