xbmxb / EnvDistractionLinks
☆23Updated last year
Alternatives and similar repositories for EnvDistraction
Users that are interested in EnvDistraction are comparing it to the libraries listed below
Sorting:
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆99Updated last month
- ☆37Updated last year
- ☆174Updated 3 months ago
- ☆51Updated last year
- 【ACL 2024】 SALAD benchmark & MD-Judge☆171Updated 11 months ago
- [ICLR 2025] Dissecting adversarial robustness of multimodal language model agents☆123Updated 11 months ago
- Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups☆50Updated last year
- TrustAgent: Towards Safe and Trustworthy LLM-based Agents☆56Updated last year
- ☆99Updated 5 months ago
- The reinforcement learning codes for dataset SPA-VL☆44Updated last year
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆191Updated last year
- Code and data for the paper: Competing Large Language Models in Multi-Agent Gaming Environments☆95Updated 2 weeks ago
- [EMNLP 2024] Multi-modal reasoning problems via code generation.☆27Updated last year
- Code for paper "Defending aginast LLM Jailbreaking via Backtranslation"☆34Updated last year
- ☆33Updated last year
- ☆44Updated last year
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆30Updated 7 months ago
- [NeurIPS 2024] Accelerating Greedy Coordinate Gradient and General Prompt Optimization via Probe Sampling☆33Updated last year
- Official repository for "Safety in Large Reasoning Models: A Survey" - Exploring safety risks, attacks, and defenses for Large Reasoning …☆88Updated 5 months ago
- [ACL2025 Best Paper] Language Models Resist Alignment☆41Updated 7 months ago
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆98Updated last year
- A novel approach to improve the safety of large language models, enabling them to transition effectively from unsafe to safe state.☆71Updated 8 months ago
- A Lightweight Visual Reasoning Benchmark for Evaluating Large Multimodal Models through Complex Diagrams in Coding Tasks☆14Updated 11 months ago
- ☆56Updated last year
- Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.☆18Updated last year
- ☆70Updated 2 years ago
- BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).☆175Updated 2 years ago
- Code, benchmark and environment for "OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows"☆37Updated 3 months ago
- ICLR2024 Paper. Showing properties of safety tuning and exaggerated safety.☆93Updated last year
- Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"☆106Updated 8 months ago