Dataset for the Tensor Trust project
☆48Mar 17, 2024Updated 2 years ago
Alternatives and similar repositories for tensor-trust-data
Users that are interested in tensor-trust-data are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A prompt injection game to collect data for robust ML research☆69Jan 27, 2025Updated last year
- official implementation of [USENIX Sec'25] StruQ: Defending Against Prompt Injection with Structured Queries☆65Nov 10, 2025Updated 4 months ago
- ☆23May 20, 2025Updated 10 months ago
- Official Repository for Can Language Models be Instructed to Protect Personal Information?☆13Oct 8, 2023Updated 2 years ago
- ACL 2023 (Findings) End-to-end Cross-lingual Label Project☆14Nov 24, 2023Updated 2 years ago
- Augmenting Statistical Models with Natural Language Parameters☆28Sep 17, 2024Updated last year
- RuLES: a benchmark for evaluating rule-following in language models☆249Feb 24, 2025Updated last year
- robust polynomial multiplication in modulo m☆19Apr 30, 2016Updated 9 years ago
- A toolkit to automatically crawl the paper list and download paper pdfs of ACL Ahthology.☆10Nov 12, 2025Updated 4 months ago
- Evaluate Transformers from the Hub 🔥☆14Nov 27, 2023Updated 2 years ago
- [ACL'24 Findings] Official code for "TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback"☆12Dec 6, 2024Updated last year
- PAL: Proxy-Guided Black-Box Attack on Large Language Models☆56Aug 17, 2024Updated last year
- [ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language M…☆434Jan 22, 2025Updated last year
- Panda Guard is designed for researching jailbreak attacks, defenses, and evaluation algorithms for large language models (LLMs).☆66Jan 19, 2026Updated 2 months ago
- Your finetuned model's back to its original safety standards faster than you can say "SafetyLock"!☆11Oct 16, 2024Updated last year
- Code and data for "Inferring Rewards from Language in Context" [ACL 2022].☆16May 22, 2022Updated 3 years ago
- ☆131Jul 7, 2025Updated 8 months ago
- Code for "A Principled Framework for Multi-View Contrastive Learning"☆20Jul 10, 2025Updated 8 months ago
- The respository describing a novel datasets for word association explanations☆13Sep 21, 2023Updated 2 years ago
- ☆19Aug 10, 2024Updated last year
- Improving Alignment and Robustness with Circuit Breakers☆258Sep 24, 2024Updated last year
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆151Jul 19, 2024Updated last year
- A Complex Judge, designed for ACOJ.☆10Sep 5, 2016Updated 9 years ago
- Finding trojans in aligned LLMs. Official repository for the competition hosted at SaTML 2024.☆116Jun 13, 2024Updated last year
- TAP: An automated jailbreaking method for black-box LLMs☆224Dec 10, 2024Updated last year
- Codes and datasets of the paper Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment☆108Mar 8, 2024Updated 2 years ago
- Public repo for ETH Escape CTF @ Devcon 2024: https://devcon.org/☆13Dec 11, 2024Updated last year
- Bilibili 虚拟主播直播数据统计☆13Jul 1, 2021Updated 4 years ago
- line-drawer recreates a given image by only drawing it by simple straight lines. Implementation inspired by linify.me and written in Pyth…☆14Aug 31, 2023Updated 2 years ago
- A symbolic benchmark for verifiable chain-of-thought financial reasoning. Includes executable templates, 58 topics across 12 domains, and…☆26Dec 26, 2025Updated 2 months ago
- ☆18Apr 7, 2025Updated 11 months ago
- Code for a research paper "Part-Based Models Improve Adversarial Robustness" (ICLR 2023)☆21Sep 16, 2023Updated 2 years ago
- ☆16Mar 22, 2025Updated last year
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆14Jun 21, 2024Updated last year
- ☆704Jul 2, 2025Updated 8 months ago
- ☆11Jan 19, 2025Updated last year
- A toolkit for testing and improving named entity recognition [ESEC/FSE'23]☆11Aug 31, 2023Updated 2 years ago
- Public code release for the paper "Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training"☆11Oct 27, 2025Updated 4 months ago
- QL-Relax☆13Aug 12, 2025Updated 7 months ago