link-zju / ORL-AuditorLinks
☆11Updated 2 years ago
Alternatives and similar repositories for ORL-Auditor
Users that are interested in ORL-Auditor are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2020, Spotlight] State-Adversarial DQN (SA-DQN) for robust deep reinforcement learning☆35Updated 4 years ago
- [CCS 2025] DPImageBench is an open-source toolkit developed to facilitate the research and application of DP image synthesis.☆27Updated 3 weeks ago
- SaTML'23 paper "Backdoor Attacks on Time Series: A Generative Approach" by Yujing Jiang, Xingjun Ma, Sarah Monazam Erfani, and James Bail…☆20Updated 2 years ago
- [S&P 2024] Replication Package for "Mind Your Data! Hiding Backdoors in Offline Reinforcement Learning Datasets".☆31Updated last year
- Adversarial attacks on Deep Reinforcement Learning (RL)☆97Updated 4 years ago
- ☆21Updated 3 years ago
- SVIP: Towards Verifiable Inference of Open-Source Large Language Models☆13Updated 6 months ago
- Learning Safety Constraints for Large Language Models (ICML2025)☆25Updated 4 months ago
- Open source implementation of the TrojDRL algorithm presented in TrojDRL: Evaluation of backdoor attacks on Deep Reinforcement Learning☆20Updated 5 years ago
- ☆126Updated 3 months ago
- [NeurIPS 2020, Spotlight] Code for "Robust Deep Reinforcement Learning against Adversarial Perturbations on Observations"☆139Updated 4 years ago
- Robust Reinforcement Learning with the Alternating Training of Learned Adversaries (ATLA) framework☆67Updated 4 years ago
- The collection of papers about Private Evolution☆17Updated 2 months ago
- [ICML 2025] "From Debate to Equilibrium: Belief-Driven Multi-Agent LLM Reasoning via Bayesian Nash Equilibrium"☆31Updated last month
- A united toolbox for running major robustness verification approaches for DNNs. [S&P 2023]☆90Updated 2 years ago
- [USENIX Security 2024] PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretrainin…☆23Updated last year
- Official implementation of the NeurIPS 2024 paper CORY☆25Updated last week
- A new model-based algorithm for offline inverse reinforcement learning☆15Updated 2 years ago
- ☆27Updated 2 years ago
- ☆70Updated 10 months ago
- Benchmarking Physical Risk Awareness of Foundation Model-based Embodied AI Agents☆22Updated last year
- This is the official repository for the ICLR 2025 accepted paper Badrobot: Manipulating Embodied LLMs in the Physical World.☆39Updated 6 months ago
- This is the code repository of our submission: Understanding the Dark Side of LLMs’ Intrinsic Self-Correction.☆63Updated last year
- [NeurIPS 2020 Spotlight] State-adversarial PPO for robust deep reinforcement learning☆31Updated 4 years ago
- Official PyTorch Implementation for Continual Learning and Private Unlearning☆17Updated 3 years ago
- ☆33Updated 3 years ago
- Code for "Adversarial Illusions in Multi-Modal Embeddings"☆30Updated last year
- A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges☆258Updated 10 months ago
- Official codes for "Understanding Deep Gradient Leakage via Inversion Influence Functions", NeurIPS 2023☆16Updated 2 years ago
- Code related to the paper "Machine Unlearning of Features and Labels"☆72Updated last year