ryoungj / ToolEmuLinks

[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use

☆172

Alternatives and similar repositories for ToolEmu

Users that are interested in ToolEmu are comparing it to the libraries listed below

Sorting:

ucl-dark / llm_debate
Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"
☆122Updated last year
sambanova / toolbench
ToolBench, an evaluation suite for LLM tool manipulation capabilities.
☆165Updated last year
tonychenxyz / selfie
This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen,…
☆54Updated 11 months ago
Princeton-SysML / Jailbreak_LLM
☆189Updated 2 years ago
openai / safety-rbr-code-and-data
Code and example data for the paper: Rule Based Rewards for Language Model Safety
☆202Updated last year
princeton-nlp / intercode
[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
☆230Updated last year
XuandongZhao / weak-to-strong
[ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models
☆90Updated 7 months ago
ScalerLab / JudgeBench
☆105Updated last year
rxlqn / awesome-llm-self-reflection
augmented LLM with self reflection
☆135Updated 2 years ago
Berkeley-NLP / Agent-Eval-Refine
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆147Updated last year
Lordog / R-Judge
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)
☆93Updated 6 months ago
StonyBrookNLP / appworld
🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource…
☆324Updated 3 weeks ago
qishenghu / InstructCoder
InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw
☆64Updated last year
Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆112Updated 4 months ago
swj0419 / detect-pretrain-code
This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Aji…
☆235Updated 2 years ago
uw-nsl / SafeDecoding
Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
☆151Updated last year
safety-research / persona_vectors
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
☆296Updated 4 months ago
GraySwanAI / circuit-breakers
Improving Alignment and Robustness with Circuit Breakers
☆245Updated last year
SafeAILab / RAIN
[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning
☆98Updated last year
kohjingyu / search-agents
Code for the paper 🌳 Tree Search for Language Model Agents
☆216Updated last year
zorazrw / awesome-tool-llm
☆241Updated last year
paul-rottger / xstest
Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"
☆116Updated 9 months ago
thestephencasper / explore_establish_exploit_llms
☆31Updated 2 years ago
allenai / wildguard
Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
☆95Updated last year
agiresearch / TrustAgent
TrustAgent: Towards Safe and Trustworthy LLM-based Agents
☆54Updated 10 months ago
GAIR-NLP / scaleeval
Scalable Meta-Evaluation of LLMs as Evaluators
☆43Updated last year
SALT-NLP / DyLAN
Official Implementation of Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization
☆181Updated last year
OSU-NLP-Group / AgentAttack
☆22Updated last year
AlexCuadron / ThinkingAgent
Systematic evaluation framework that automatically rates overthinking behavior in large language models.
☆94Updated 6 months ago
zhangxjohn / LLM-Agent-Benchmark-List
A banchmark list for evaluation of large language models.
☆152Updated 3 months ago