[ICLR 2025] Official implementation for "SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations"
☆43Feb 11, 2025Updated last year
Alternatives and similar repositories for SafeWatch
Users that are interested in SafeWatch are comparing it to the libraries listed below
Sorting:
- ECSO (Make MLLM safe without neither training nor any external models!) (https://arxiv.org/abs/2403.09572)☆35Nov 2, 2024Updated last year
- Official implementation of T2Vs Meet VLMs: A Scalable Multimodal Dataset for Visual Harmfulness Recognition☆21Oct 23, 2024Updated last year
- [EMNLP 2025] The code repo of paper "X-Boundary: Establishing Exact Safety Boundary to Shield LLMs from Multi-Turn Jailbreaks without Com…☆39Nov 24, 2025Updated 3 months ago
- ☆18Mar 30, 2025Updated 11 months ago
- The reinforcement learning codes for dataset SPA-VL☆44Jun 24, 2024Updated last year
- [AAAI'25 (Oral)] Jailbreaking Large Vision-language Models via Typographic Visual Prompts☆191Jun 26, 2025Updated 8 months ago
- [CVPR2026] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice☆65Updated this week
- The official repository for paper "MLLM-Protector: Ensuring MLLM’s Safety without Hurting Performance"☆44Apr 21, 2024Updated last year
- [USENIX'25] HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns☆13Mar 1, 2025Updated last year
- Generalized Optimal Transport Attention with Trainable Priors☆24Jan 25, 2026Updated last month
- [ICCV2025] Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning☆23Nov 13, 2025Updated 3 months ago
- Codebase, data and models for the Re-Thinking the Shuffle Test paper at ACL2021☆10Oct 14, 2022Updated 3 years ago
- Official repo of Exploring the Adversarial Vulnerabilities of Vision-Language-Action Models in Robotics☆68Jan 27, 2026Updated last month
- [TMLR'24] This repository includes the official implementation our paper "Unleashing the Power of Visual Prompting At the Pixel Level"☆42Apr 30, 2024Updated last year
- Active Learning Helps Pretrained Models Learn the Intended Task (https://arxiv.org/abs/2204.08491) by Alex Tamkin, Dat Nguyen, Salil Desh…☆11Nov 22, 2022Updated 3 years ago
- ☆13Aug 27, 2021Updated 4 years ago
- [KDD 2026 ADS Track] Pytorch implementation of the paper "Hi-Guard: Towards Trustworthy Multimodal Moderation via Policy-Aligned Reasonin…☆21Jan 13, 2026Updated last month
- Official repository of "TDSD: Text-Driven Scene-Decoupled Weakly Supervised Video Anomaly Detection"☆11May 25, 2025Updated 9 months ago
- [CVPR 2026] FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection☆25Feb 10, 2026Updated 3 weeks ago
- ☆11Aug 9, 2022Updated 3 years ago
- The repo for using the model https://huggingface.co/thu-coai/Attacker-v0.1☆13Apr 23, 2025Updated 10 months ago
- Official code of "The Automated but Risky Game: Modeling Agent-to-Agent Negotiations and Transactions in Consumer Markets"☆23Sep 20, 2025Updated 5 months ago
- ☆19May 14, 2025Updated 9 months ago
- ⚖️ Code for the paper "Ethical Adversaries: Towards Mitigating Unfairness with Adversarial Machine Learning".☆11Dec 8, 2022Updated 3 years ago
- Official code for "Rethinking Chain-of-Thought Reasoning for Videos"☆20Dec 14, 2025Updated 2 months ago
- an atomatic notr maker for CYTUS☆11Mar 23, 2022Updated 3 years ago
- ☆18Sep 20, 2025Updated 5 months ago
- [MedIA 2026] Official implementation of TTGA: Test-Time Generative Augmentation for Medical Image Segmentation.☆12Jan 5, 2026Updated last month
- Official repository for ACM Multimedia'24 paper "MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube a…☆18Aug 11, 2024Updated last year
- 虚拟桌宠模拟器 Live2D动画组件☆12Jun 22, 2025Updated 8 months ago
- DSN jailbreak Attack & Evaluation Ensemble☆16Feb 7, 2026Updated 3 weeks ago
- [ICLR 2026] Official repo for "Spotlight on Token Perception for Multimodal Reinforcement Learning"☆49Jan 30, 2026Updated last month
- Official implementation of Visco-Attack (EMNLP 2025 Main). We will progressively release the code and one-click reproduction scripts.☆30Aug 22, 2025Updated 6 months ago
- code and data associated with CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations☆11Oct 13, 2023Updated 2 years ago
- ReasoningShield: Safety Detection over Reasoning Traces of Large Reasoning Models☆25Sep 27, 2025Updated 5 months ago
- A simple template for TensorFlow's highly efficient CudnnLSTM module☆11Jun 8, 2018Updated 7 years ago
- Code release for "A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adapt…☆13Mar 15, 2024Updated last year
- Code to reproduce 'MOCCA: Multi-Layer One-Class Classification for Anomaly Detection'☆10Dec 12, 2021Updated 4 years ago
- TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment☆10Mar 1, 2025Updated last year