A Benchmark for Evaluating Safety and Trustworthiness in Web Agents for Enterprise Scenarios
☆21Mar 12, 2026Updated last month
Alternatives and similar repositories for ST-WebAgentBench
Users that are interested in ST-WebAgentBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆25Apr 3, 2024Updated 2 years ago
- ☆11Feb 28, 2024Updated 2 years ago
- LLMPerf is a library for validating and benchmarking LLMs☆11Aug 13, 2024Updated last year
- ☆13May 10, 2025Updated 11 months ago
- Ranking-Consistent Language-Image Pretraining☆12Oct 24, 2025Updated 5 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Deep Learning - Visual Representation Learning by solving Jigsaw puzzles using Deep Reinforcement Learning☆10Dec 8, 2016Updated 9 years ago
- Nethack Learning Environment Wrapper for Language Interface☆42Sep 11, 2023Updated 2 years ago
- This is the official code base of AgentNetTool in OpenCUA. Website: https://opencua.xlang.ai/☆45Sep 3, 2025Updated 7 months ago
- Let there be clock in the beach - WACV 2022☆15Nov 15, 2021Updated 4 years ago
- Using Vrep to simulate a six-legged robot to do motion planning & path planning☆10Jan 10, 2019Updated 7 years ago
- Code for Unsupervised multi-granular Chinese word segmentation and term discovery via graph partition [JBI]☆16Jan 28, 2022Updated 4 years ago
- TabSTAR: A Tabular Foundation Model for Tabular Data with Text Fields☆82Mar 15, 2026Updated last month
- Code and data for "An Accurate Unsupervised Method for Joint Entity Alignment and Dangling Entity Detection".☆15Mar 26, 2022Updated 4 years ago
- Text generation using language models with multiple exit heads☆16Sep 18, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Code implementation of R^2-Guard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning☆22Jul 8, 2024Updated last year
- Codes for "Understanding MCMC Dynamics as Flows on the Wasserstein Space" (ICML-19)☆12Nov 17, 2019Updated 6 years ago
- [NeurIPS 2025] Official Implementation for "Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding"☆22Dec 8, 2024Updated last year
- ACL 2023 paper "A Critical Evaluation of Evaluations for Long-form Question Answering"☆21Mar 22, 2024Updated 2 years ago
- Repo for the paper "Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks".☆62Updated this week
- High performance SRMD implementation using CUDA.☆28Mar 28, 2023Updated 3 years ago
- ☆15Aug 7, 2025Updated 8 months ago
- established for the data normalization and reinforcement learning training scheme to train an agent in DCS world☆12Oct 22, 2021Updated 4 years ago
- Official repository for "On the Multi-modal Vulnerability of Diffusion Models"☆16Jul 15, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 基于RSA+AES+SSL的文件安全传输系统☆20Jul 14, 2020Updated 5 years ago
- ☆26May 30, 2023Updated 2 years ago
- Code for AISTATS 2023 paper - Estimating Total Correlation with Mutual Information Estimators☆17Dec 15, 2023Updated 2 years ago
- [NeurIPS'25 Spotlight] MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation☆21Feb 23, 2025Updated last year
- Code for "On Measuring Faithfulness of Natural Language Explanations"☆22Jul 23, 2024Updated last year
- Unofficial LaTex templates for thesis and IEEE conference at National Taiwan University. 國立臺灣大學電機資訊學院碩博士論文及 IEEE conference 模板☆32Feb 9, 2025Updated last year
- Code for the AAAI 2020 oral paper - Dynamic Embedding on Textual Networks via a Gaussian Process.☆12Mar 26, 2020Updated 6 years ago
- AIR-Bench 2024 is a safety benchmark that aligns with emerging government regulations and company policies☆30Aug 14, 2024Updated last year
- SafeArena is a benchmark for assessing the harmful capabilities of web agents☆21Apr 23, 2025Updated 11 months ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- mcp scan that scans any mcp server for indirect attack vectors and security or configuration vulnerabilities☆85Mar 30, 2026Updated 2 weeks ago
- ☆31Oct 23, 2024Updated last year
- ☆21Apr 23, 2025Updated 11 months ago
- A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency o…☆28Aug 7, 2025Updated 8 months ago
- A Python library for guardrail models evaluation.☆35Oct 9, 2025Updated 6 months ago
- Summary of recent news recommendation papers.☆25Feb 2, 2022Updated 4 years ago
- The collection of papers about Private Evolution☆18Mar 23, 2026Updated 3 weeks ago