ryoungj / ToolEmuLinks
[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use
☆172Updated last year
Alternatives and similar repositories for ToolEmu
Users that are interested in ToolEmu are comparing it to the libraries listed below
Sorting:
- Code release for "Debating with More Persuasive LLMs Leads to More Truthful Answers"☆122Updated last year
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.☆165Updated last year
- This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen,…☆54Updated 11 months ago
- ☆189Updated 2 years ago
- Code and example data for the paper: Rule Based Rewards for Language Model Safety☆202Updated last year
- [NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898☆230Updated last year
- [ICML 2025] Weak-to-Strong Jailbreaking on Large Language Models☆90Updated 7 months ago
- ☆105Updated last year
- augmented LLM with self reflection☆135Updated 2 years ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆147Updated last year
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆93Updated 6 months ago
- 🌍 AppWorld: A Controllable World of Apps and People for Benchmarking Function Calling and Interactive Coding Agent, ACL'24 Best Resource…☆324Updated 3 weeks ago
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆64Updated last year
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆112Updated 4 months ago
- This repository provides an original implementation of Detecting Pretraining Data from Large Language Models by *Weijia Shi, *Anirudh Aji…☆235Updated 2 years ago
- Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding☆151Updated last year
- Persona Vectors: Monitoring and Controlling Character Traits in Language Models☆296Updated 4 months ago
- Improving Alignment and Robustness with Circuit Breakers☆245Updated last year
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆98Updated last year
- Code for the paper 🌳 Tree Search for Language Model Agents☆216Updated last year
- ☆241Updated last year
- Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"☆116Updated 9 months ago
- ☆31Updated 2 years ago
- Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs☆95Updated last year
- TrustAgent: Towards Safe and Trustworthy LLM-based Agents☆54Updated 10 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆43Updated last year
- Official Implementation of Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization☆181Updated last year
- ☆22Updated last year
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆94Updated 6 months ago
- A banchmark list for evaluation of large language models.☆152Updated 3 months ago