ryoungj / ToolEmuLinks
[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use
☆142Updated last year
Alternatives and similar repositories for ToolEmu
Users that are interested in ToolEmu are comparing it to the libraries listed below
Sorting:
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆107Updated last year
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆133Updated 10 months ago
- ☆76Updated last month
- Improving Alignment and Robustness with Circuit Breakers☆208Updated 8 months ago
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆127Updated 10 months ago
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆77Updated 6 months ago
- [ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models☆76Updated last month
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆93Updated last year
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆97Updated 3 months ago
- Code for paper "LEVER: Learning to Verifiy Language-to-Code Generation with Execution" (ICML'23)☆87Updated last year
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆219Updated last year
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- [ICLR 2025] Dissecting adversarial robustness of multimodal language model agents☆88Updated 3 months ago
- Code for the paper <SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning>☆49Updated last year
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆84Updated last year
- ☆173Updated last year
- ☆31Updated last year
- Dataset for the Tensor Trust project☆40Updated last year
- Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)☆61Updated 4 months ago
- 【ACL 2024】 SALAD benchmark & MD-Judge☆147Updated 2 months ago
- Official implementation of ICLR'24 paper, "Curiosity-driven Red Teaming for Large Language Models" (https://openreview.net/pdf?id=4KqkizX…☆75Updated last year
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆32Updated 5 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆206Updated 3 weeks ago
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆60Updated 2 weeks ago
- Code for the paper 🌳 Tree Search for Language Model Agents☆200Updated 10 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆141Updated 7 months ago
- ☆81Updated 6 months ago
- WMDP is a LLM proxy benchmark for hazardous knowledge in bio, cyber, and chemical security. We also release code for RMU, an unlearning m…☆121Updated last week
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆136Updated 6 months ago
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆76Updated 3 weeks ago