lasgroup/SafetyPolytope

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lasgroup/SafetyPolytope)

lasgroup / SafetyPolytope

Learning Safety Constraints for Large Language Models (ICML2025)

☆35

Alternatives and similar repositories for SafetyPolytope

Users that are interested in SafetyPolytope are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yuki-younai / MTSA
View on GitHub
offical implementation of MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming
☆16Jun 2, 2025Updated last year
mahmoudkanazzal / PromSec
View on GitHub
☆12Dec 22, 2025Updated 7 months ago
jqueeney / robust-safe-rl
View on GitHub
Robust and safe deep reinforcement learning algorithms
☆17Mar 27, 2024Updated 2 years ago
yardenas / actsafe
View on GitHub
Scaling safe exploration to vision control
☆15Feb 19, 2025Updated last year
CGCL-codes / Gen-AF
View on GitHub
The implementation of our IEEE S&P 2024 paper "Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples".
☆11Jun 28, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
EladSharony / miso
View on GitHub
MISO: Learning Multiple Initial Solutions to Optimization Problems
☆18Jun 18, 2026Updated last month
THU-KEG / SafetyNeuron
View on GitHub
Data and code for the paper: Finding Safety Neurons in Large Language Models
☆29Jan 29, 2026Updated 5 months ago
RJ-T / NIPS2022_EP_BNP
View on GitHub
Official Implementation of NIPS 2022 paper Pre-activation Distributions Expose Backdoor Neurons
☆15Jan 13, 2023Updated 3 years ago
KumarRobotics / risk_mpc
View on GitHub
☆13Aug 17, 2025Updated 11 months ago
thu-ml / STAIR
View on GitHub
Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"
☆89Feb 26, 2025Updated last year
drzekunguo / LLM_Agent_Power_System_Balance
View on GitHub
☆14Mar 30, 2025Updated last year
lasgroup / safe-learning
View on GitHub
A collection of algorithms and experiment tools for safe sim to real transfer in robotics.
☆28May 19, 2026Updated 2 months ago
MiracleHH / nas_privacy
View on GitHub
Official Code Implementation for the CCS 2022 Paper "On the Privacy Risks of Cell-Based NAS Architectures"
☆11Nov 21, 2022Updated 3 years ago
KumarRobotics / lang-air-ground-teaming
View on GitHub
☆15Jun 30, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
lisa-wm / entropybaseduq
View on GitHub
☆12Apr 4, 2025Updated last year
Lingkai-Kong / RE-Control
View on GitHub
Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective
☆35Jan 31, 2025Updated last year
IBM / sae-steering
View on GitHub
Code to enable layer-level steering in LLMs using sparse auto encoders
☆34Sep 18, 2025Updated 10 months ago
BigdogManLuo / Wind-ESS-Optimization-in-Frequency-Regulation-Market
View on GitHub
Code implementation of “Flexible Coordination of Wind Generators and Energy Storages in joint Energy and Frequency Regulation Market“
☆11Sep 26, 2023Updated 2 years ago
mahaitongdae / Feasible-Actor-Critic
View on GitHub
Code for paper Feasible Actor-Critic: Constrained Reinforcement Learning for Ensuring Statewise Safety.
☆20May 22, 2022Updated 4 years ago
CryptoAILab / misalignment
View on GitHub
[NDSS'25] The official implementation of safety misalignment.
☆19Jan 8, 2025Updated last year
facebookresearch / prompt-siren
View on GitHub
A research workbench for developing and testing attacks against large language models, with a focus on prompt injection vulnerabilities a…
☆55Updated this week
KumarRobotics / RT-GuIDE
View on GitHub
[RA-L 2025] RT-GuIDE: Real-Time Gaussian Splatting for Information-Driven Exploration
☆23Nov 30, 2025Updated 7 months ago
zhxieml / remiss-jailbreak
View on GitHub
☆33Jun 24, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
SimonZhan-code / Step-Wise_SafeRL_Pixel
View on GitHub
Code space for L4DC paper "State-wise Safe Reinforcement Learning With Pixel Observations"
☆11Apr 5, 2024Updated 2 years ago
AurelianTactics / dqnclipped_dqnreg_prelim_implementation
View on GitHub
Implementing DQNClipped and DQNReg Algorithms
☆10Mar 2, 2021Updated 5 years ago
Unispac / Fight-Poison-With-Poison
View on GitHub
Code repository for the paper --- [USENIX Security 2023] Towards A Proactive ML Approach for Detecting Backdoor Poison Samples
☆31Jul 11, 2023Updated 3 years ago
EsYoon7 / RLHF-TLCR
View on GitHub
[ACL'24 Findings] Official code for "TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback"
☆12Dec 6, 2024Updated last year
SeanXLee / universal-pe-starterkit
View on GitHub
A modular power electronics hardware starter kit designed for sub-100V applications. Originally developed for the National Electronic Des…
☆18Jan 5, 2026Updated 6 months ago
dmar-bonn / stair
View on GitHub
[IROS2024] STAIR: Semantic-Targeted Active Implicit Reconstruction
☆16Aug 3, 2024Updated last year
euanong / image-hijacks
View on GitHub
Official codebase for Image Hijacks: Adversarial Images can Control Generative Models at Runtime
☆56Sep 19, 2023Updated 2 years ago
ZZR0 / CodeAttack
View on GitHub
Adversarial Attack for Pre-trained Code Models
☆10Jul 19, 2022Updated 4 years ago
skylineeeeen / DOTA
View on GitHub
☆17Jul 2, 2026Updated 2 weeks ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
ZrW00 / MuScleLoRA
View on GitHub
The code implementation of MuScleLoRA (Accepted in ACL 2024)
☆10Dec 1, 2024Updated last year
frost-beta / train-model-with-js
View on GitHub
Train text generation model with JavaScript.
☆15Jul 14, 2024Updated 2 years ago
CHATS-lab / LLMs_Encode_Harmfulness_Refusal_Separately
View on GitHub
☆41Jul 3, 2026Updated 2 weeks ago
yardenas / panda-rl-kit
View on GitHub
Deploy RL on your Real-World Franka Emika Panda
☆15Feb 22, 2026Updated 5 months ago
ming93 / Safe_reinforcement_learning
View on GitHub
Convergent Policy Optimization for Safe Reinforcement Learning
☆11Oct 26, 2019Updated 6 years ago
XiongPengNUS / PandaShifu
View on GitHub
☆18May 14, 2026Updated 2 months ago
CyberAgentAILab / filtered-dpo
View on GitHub
[EMNLP 2024] Introducing Filtered Direct Preference Optimization (fDPO) that enhances language model alignment with human preferences by …
☆16Nov 27, 2024Updated last year