RobustNLP/DeRTa

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/RobustNLP/DeRTa)

RobustNLP / DeRTa

A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.

☆72

Alternatives and similar repositories for DeRTa

Users that are interested in DeRTa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

CUHK-Shenzhen-SE / RetromorphicTesting
View on GitHub
☆11Jan 19, 2025Updated last year
CUHK-Shenzhen-SE / D4C
View on GitHub
[ICSE'25] Aligning the Objective of LLM-based Program Repair
☆24Mar 8, 2025Updated last year
LGU-SE-Internal / GRev
View on GitHub
A lightweight tool for detecting bugs on Graph Database Management Systems
☆15Jan 9, 2024Updated 2 years ago
AI45Lab / CodeAttack
View on GitHub
[ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
☆61Oct 1, 2025Updated 9 months ago
xyliu-cs / RISE
View on GitHub
[NeurIPS'25] Official Implementation of RISE (Reinforcing Reasoning with Self-Verification)
☆33Aug 8, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Jarviswang94 / MMSafetyAwareness
View on GitHub
Multimodal Safety Awareness Benchmark for Large Language Models
☆15Jun 3, 2025Updated last year
Skytliang / SpyGame
View on GitHub
SpyGame: An interactive multi-agent framework to evaluate intelligence with large language models :D
☆15Nov 9, 2023Updated 2 years ago
Jarviswang94 / Multilingual_safety_benchmark
View on GitHub
Multilingual safety benchmark for Large Language Models
☆53Sep 1, 2024Updated last year
gaiusyu / Denum
View on GitHub
A log compression tool (ASE2024)
☆17Apr 15, 2025Updated last year
RobustNLP / CipherChat
View on GitHub
A framework to evaluate the generalization capability of safety alignment for LLMs
☆628Oct 9, 2025Updated 9 months ago
Jarviswang94 / MTTM
View on GitHub
MTTM: Metamorphic Testing for Textual Content Moderation Software
☆31Feb 10, 2023Updated 3 years ago
RobustNLP / TestTranslation
View on GitHub
A toolkit for testing machine translation [ICSE'20, '21, ESEC/FSE'20]
☆33Nov 15, 2021Updated 4 years ago
RobustNLP / TestNER
View on GitHub
A toolkit for testing and improving named entity recognition [ESEC/FSE'23]
☆11Aug 31, 2023Updated 2 years ago
thu-coai / JailbreakDefense_GoalPriority
View on GitHub
[ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization
☆29Jul 9, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
alphadl / SafeLLM_with_IntentionAnalysis
View on GitHub
Towards Safe LLM with our simple-yet-highly-effective Intention Analysis Prompting
☆21Mar 25, 2024Updated 2 years ago
SaFo-Lab / ReasoningBomb
View on GitHub
[CCS 2026] The official implementation of our CCS 2026 paper "ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing Pathological…
☆15Jun 24, 2026Updated 3 weeks ago
CUHK-ARISE / GAMABench
View on GitHub
Code and data for the paper: Competing Large Language Models in Multi-Agent Gaming Environments
☆98Jan 26, 2026Updated 5 months ago
hexuandeng / Mono4SiMT
View on GitHub
The implementation for our paper, "Improving Simultaneous Machine Translation with Monolingual Data," accepted to AAAI 2023. 🎉
☆12Jul 19, 2023Updated 3 years ago
dxhou / CoAct
View on GitHub
☆32Jul 8, 2024Updated 2 years ago
thu-coai / SafeUnlearning
View on GitHub
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks
☆32Jul 9, 2024Updated 2 years ago
penguinnnnn / awesome-llm-and-society
View on GitHub
Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.
☆51Nov 3, 2023Updated 2 years ago
Dtc7w3PQ / Response-Attack
View on GitHub
Official implementation of “Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models” (AAAI 2026).
☆37Mar 22, 2026Updated 3 months ago
VMnK-Run / MARVEL
View on GitHub
[ASE2024] Mutual Learning-Based Framework for Enhancing Robustness of Code Models via Adversarial Training
☆11Sep 13, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
AI45Lab / ActorAttack
View on GitHub
☆135Jun 29, 2026Updated 3 weeks ago
wxjiao / InstructMT
View on GitHub
A collection of instruction data and scripts for machine translation.
☆20Sep 23, 2023Updated 2 years ago
uw-nsl / SafeDecoding
View on GitHub
Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
☆154Jul 19, 2024Updated 2 years ago
AI45Lab / DeepScan
View on GitHub
Diagnostic Framework for LLMs and MLLMs
☆39Mar 2, 2026Updated 4 months ago
AISG-Technology-Team / GCSS-Track-1A-Submission-Guide
View on GitHub
Submission Guide + Discussion Board for AI Singapore Global Challenge for Safe and Secure LLMs (Track 1A).
☆16Jul 4, 2024Updated 2 years ago
MingyuJ666 / LVLM-Safety
View on GitHub
[FCS'24] LVLM Safety paper
☆19Jan 4, 2025Updated last year
wooozihui / jailbreakfunction
View on GitHub
[COLING 2025] Official code of the paper "The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models"
☆59Dec 26, 2024Updated last year
SanRazor-repo / SanRazor
View on GitHub
SanRazor is a sanitizer check reduction tool aiming to incur little overhead while retaining all important sanitizer checks.
☆56Jun 6, 2021Updated 5 years ago
zwhe99 / LLM-MT-Eval
View on GitHub
{DeepL, Google, WMT-Best, davinci-003, turbo, gpt-4} × {En-De, En-Cs, En-Ru, En-Zh, De-Fr, En-Ja, Uk-En, Uk-Cs, En-Hr, En-Ha, En-Is}
☆14Jun 18, 2023Updated 3 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
Atrewin / PGen
View on GitHub
Implementation of our paper "Scaling Back-Translation with Domain Text Generation for Sign Language Gloss Translation". Accepted in EACL …
☆11May 22, 2023Updated 3 years ago
RickySkywalker / LeanOfThought-Official
View on GitHub
This is the official implementation for MA-LoT.
☆20Aug 4, 2025Updated 11 months ago
xyq7 / GradSafe
View on GitHub
Official Code for ACL 2024 paper "GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis"
☆68Oct 27, 2024Updated last year
open-compass / Ada-LEval
View on GitHub
The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"
☆56May 22, 2025Updated last year
pillowsofwind / Course-Correction
View on GitHub
[EMNLP 2024] The official GitHub repo for the paper "Course-Correction: Safety Alignment Using Synthetic Preferences"
☆20Oct 2, 2024Updated last year
CriticBench / CriticBench
View on GitHub
[ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
☆31Mar 5, 2024Updated 2 years ago
kennymckormick / ARAS-Dataset
View on GitHub
☆11Nov 5, 2024Updated last year