[ICLR'26, NAACL'25 Demo] Toolkit & Benchmark for evaluating the trustworthiness of generative foundation models.
☆127Aug 22, 2025Updated 6 months ago
Alternatives and similar repositories for TrustEval-toolkit
Users that are interested in TrustEval-toolkit are comparing it to the libraries listed below
Sorting:
- NeurIPS 2025 Poster☆26Feb 4, 2025Updated last year
- [ICLR'25] DataGen: Unified Synthetic Dataset Generation via Large Language Models☆66Mar 8, 2025Updated 11 months ago
- ☆19May 14, 2025Updated 9 months ago
- NLPBench: Evaluating NLP-Related Problem-solving Ability in Large Language Models☆10Oct 27, 2023Updated 2 years ago
- [EMNLP 2023] Explainable Claim Verification via Knowledge-Grounded Reasoning with Large Language Models☆26Dec 13, 2023Updated 2 years ago
- ☆15Aug 30, 2025Updated 6 months ago
- This repository contains a PyTorch implementation of the ICSE'26 paper "Scrub It Out! Erasing Sensitive Memorization in Code Language Mod…☆30Sep 18, 2025Updated 5 months ago
- [AAAI-26] Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?☆26Dec 14, 2025Updated 2 months ago
- [ICCV 2023] MADAug: When to Learn What: Model-Adaptive Data Augmentation Curriculum☆19Nov 9, 2023Updated 2 years ago
- AutoHallusion Codebase (EMNLP 2024)☆22Dec 6, 2024Updated last year
- This work corroborates a run-time Trojan detection method exploiting STRong Intentional Perturbation of inputs, is a multi-domain Trojan …☆10Mar 7, 2021Updated 4 years ago
- Code for paper: Optimizing Length Compression in Large Reasoning Models☆27Oct 20, 2025Updated 4 months ago
- ☆30Feb 18, 2025Updated last year
- ☆32Jul 11, 2024Updated last year
- [TMLR'25] AutoTrust, a groundbreaking benchmark designed to assess the trustworthiness of DriveVLMs. This work aims to enhance public saf…☆53Nov 20, 2025Updated 3 months ago
- [CIKM 2024] Trojan Activation Attack: Attack Large Language Models using Activation Steering for Safety-Alignment.☆29Jul 29, 2024Updated last year
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆31Aug 7, 2025Updated 7 months ago
- this is for the ACM MM paper---Backdoor Attack on Crowd Counting☆17Jul 10, 2022Updated 3 years ago
- Data and code for ACL 2023 paper "RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations"☆15Feb 8, 2024Updated 2 years ago
- [NeurIPS 24] Can LLMs Solve Molecule Puzzles? A Multimodal Benchmark for Molecular Structure Elucidation☆18Jan 2, 2026Updated 2 months ago
- ☆23May 21, 2025Updated 9 months ago
- This repository contains data of TruthSocial posts related to the 2024 U.S. Elections☆12Nov 1, 2024Updated last year
- [CCS 2024] Optimization-based Prompt Injection Attack to LLM-as-a-Judge☆39Sep 17, 2025Updated 5 months ago
- [ICML25] CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale☆25Jul 31, 2025Updated 7 months ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆73Nov 27, 2024Updated last year
- XL-VLMs: General Repository for eXplainable Large Vision Language Models☆46Sep 8, 2025Updated 5 months ago
- Official Code for ACL 2024 paper "GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis"☆66Oct 27, 2024Updated last year
- ☆20Jul 16, 2024Updated last year
- An implementation for MLLM oversensitivity evaluation☆17Nov 16, 2024Updated last year
- HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal☆864Aug 16, 2024Updated last year
- This is the code for semi-supervised robust training (SRT).☆18Mar 24, 2023Updated 2 years ago
- [ICCV 2023] Subclass-balancing contrastive learning for long-tailed recognition☆18Oct 30, 2023Updated 2 years ago
- ☆20May 28, 2025Updated 9 months ago
- ☆178Oct 31, 2025Updated 4 months ago
- ☆21Jul 26, 2025Updated 7 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Sep 26, 2024Updated last year
- This is a python script to generate nice bibtex file for latex.☆18Mar 1, 2020Updated 6 years ago
- ☆20Mar 14, 2022Updated 3 years ago
- [NeurIPS 2021] Better Safe Than Sorry: Preventing Delusive Adversaries with Adversarial Training☆32Jan 9, 2022Updated 4 years ago