SafeArena is a benchmark for assessing the harmful capabilities of web agents
☆23Apr 23, 2025Updated last year
Alternatives and similar repositories for safearena
Users that are interested in safearena are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- TACL 2025: Investigating Adversarial Trigger Transfer in Large Language Models☆19Aug 17, 2025Updated 10 months ago
- Code for "Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model", EMNLP Findings 20…☆28Nov 2, 2023Updated 2 years ago
- AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories☆45Aug 7, 2025Updated 10 months ago
- OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents☆25May 17, 2026Updated last month
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆622Oct 7, 2025Updated 8 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- virtual node analysis on ogb benchmark dataset☆14Mar 9, 2023Updated 3 years ago
- This repository contains the dataset and code for our ACL'23 publication: "MatSci-NLP: Evaluating Scientific Language Models on Materials…☆17Nov 21, 2023Updated 2 years ago
- The official Genbench Collaborative Benchmarking Task repository 2023 (Archived)☆14Jul 23, 2024Updated last year
- ☆11Feb 19, 2023Updated 3 years ago
- ACL 2020 papers by authors who are members of underrepresented groups (URMs)☆16Jul 10, 2020Updated 5 years ago
- This is the repository of the Dense Hierarchical Retrieval for Open-Domain Question Answering☆14Dec 23, 2021Updated 4 years ago
- ☆25Jan 22, 2025Updated last year
- Write applications to charge money to your friends after you paid the whole bill by easily parsing the receipt 💸☆14Oct 22, 2022Updated 3 years ago
- More Information about Features, Deliverables and Publications @☆11May 17, 2016Updated 10 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆10Aug 22, 2022Updated 3 years ago
- ☆13Apr 7, 2024Updated 2 years ago
- FeedbackQA: Improving Question Answering Post-Deployment with Interactive Feedback☆12Jul 13, 2022Updated 3 years ago
- PyTorch reimplementation of REALM and ORQA☆22Feb 3, 2022Updated 4 years ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆87Aug 12, 2024Updated last year
- Competitive Programming☆13Feb 25, 2026Updated 3 months ago
- To mitigate position bias in LLMs, especially in long-context scenarios, we scale only one dimension of LLMs, reducing position bias and …☆12Jun 18, 2024Updated 2 years ago
- Data splits for the NAACL 2016 paper☆22Mar 17, 2016Updated 10 years ago
- We introduce EMMET and unify model editing with popular algorithms ROME and MEMIT.☆29Dec 16, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Decorate all methods of a class using a single class decorator☆28Nov 9, 2022Updated 3 years ago
- Implementation of the Mask R-CNN model using OCaml's numerical library Owl.☆19Jan 30, 2020Updated 6 years ago
- Fetch build artifacts from CircleCI.☆21Mar 15, 2022Updated 4 years ago
- Project of ACL 2025 "UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models"☆14Mar 25, 2025Updated last year
- Curated list of awesome ML Visualization Libraries☆14Jun 23, 2023Updated 2 years ago
- A collection of functions to help you easily train and run Tensorflow Keras. It includes 1-line auto-TPU support, GPU memory management, …☆12Jul 6, 2022Updated 3 years ago
- Official repository for WWW'24 paper "MemeCraft: Contextual and Stance-Driven Multimodal Meme Generation"☆12Jul 25, 2024Updated last year
- ACL 2023 paper "A Critical Evaluation of Evaluations for Long-form Question Answering"☆21Mar 22, 2024Updated 2 years ago
- ☆17Feb 17, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [WSDM'2025] "MixRec: Heterogeneous Graph Collaborative Filtering"☆21Dec 19, 2024Updated last year
- ☆26Jan 5, 2026Updated 5 months ago
- Unofficial LaTex templates for thesis and IEEE conference at National Taiwan University. 國立臺灣大學電機資訊學院碩博士論文及 IEEE conference 模板☆33Feb 9, 2025Updated last year
- [ICLR 2026] The official code for "Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models"☆27Feb 7, 2026Updated 4 months ago
- Aurora is a central design system for all products and applications for the Open, Accessible Digital Workspace. This repo is for all code…☆15Feb 23, 2024Updated 2 years ago
- [SIGIR'22] Official PyTorch implementation for "Learning to Denoise Unreliable Interactions for Graph Collaborative Filtering".☆18Oct 24, 2022Updated 3 years ago
- ☆11Oct 15, 2023Updated 2 years ago