HowieHwong / TrustLLMLinks

[ICML 2024] TrustLLM: Trustworthiness in Large Language Models

☆601

Alternatives and similar repositories for TrustLLM

Users that are interested in TrustLLM are comparing it to the libraries listed below

Sorting:

LLM-Tuning-Safety / LLMs-Finetuning-Safety
We jailbreak GPT-3.5 Turbo’s safety guardrails by fine-tuning it on only 10 adversarially designed examples, at a cost of less than $0.20…
☆328Updated last year
TrustGen / TrustEval-toolkit
Toolkit for evaluating the trustworthiness of generative foundation models.
☆122Updated 2 months ago
niconi19 / LLM-Conversation-Safety
[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
☆106Updated last year
AI-secure / DecodingTrust
A Comprehensive Assessment of Trustworthiness in GPT Models
☆306Updated last year
LuckyyySTA / Awesome-LLM-hallucination
LLM hallucination paper list
☆323Updated last year
kevinyaobytedance / llm_unlearn
LLM Unlearning
☆177Updated 2 years ago
chrisliu298 / awesome-llm-unlearning
A resource repository for machine unlearning in large language models
☆501Updated 3 months ago
Xianjun-Yang / Awesome_papers_on_LLMs_detection
The lastest paper about detection of LLM-generated text and code
☆280Updated 4 months ago
jlko / semantic_uncertainty
Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).
☆380Updated last year
JailbreakBench / jailbreakbench
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]
☆446Updated 6 months ago
chawins / llm-sp
Papers and resources related to the security and privacy of LLMs 🤖
☆539Updated 4 months ago
WhileBug / AwesomeLLMJailBreakPapers
Awesome LLM Jailbreak academic papers
☆111Updated last year
OpenSafetyLab / SALAD-BENCH
【ACL 2024】 SALAD benchmark & MD-Judge
☆163Updated 7 months ago
Libr-AI / do-not-answer
Do-Not-Answer: A Dataset for Evaluating Safeguards in LLMs
☆295Updated last year
SheltonLiu-N / AutoDAN
[ICLR 2024] The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language M…
☆388Updated 9 months ago
EdinburghNLP / awesome-hallucination-detection
List of papers on hallucination detection in LLMs.
☆978Updated this week
Zhen-Tan-dmml / LLM4Annotation
☆615Updated 3 months ago
usail-hkust / JailTrickBench
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Empirical tricks for LLM Jailbreaking. (NeurIPS 2024)
☆152Updated 11 months ago
RUCAIBox / HaluEval
This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.
☆517Updated last year
uw-nsl / SafeDecoding
Official Repository for ACL 2024 Paper SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
☆146Updated last year
allenai / wildguard
Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
☆92Updated 10 months ago
git-disl / awesome_LLM-harmful-fine-tuning-papers
A survey on harmful fine-tuning attack for large language model
☆215Updated 2 weeks ago
hzy312 / Awesome-LLM-Watermark
UP-TO-DATE LLM Watermark paper. 🔥🔥🔥
☆360Updated 10 months ago
potsawee / selfcheckgpt
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
☆571Updated last year
OSU-NLP-Group / AgentSafety
☆119Updated 5 months ago
sleeepeer / PoisonedRAG
[USENIX Security 2025] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models
☆210Updated 8 months ago
QinbinLi / LLM-PBE
A toolkit to assess data privacy in LLMs (under development)
☆62Updated 9 months ago
llm-as-a-judge / Awesome-LLM-as-a-judge
☆451Updated 3 months ago
iamgroot42 / mimir
Python package for measuring memorization in LLMs.
☆172Updated 3 months ago
hy-zhao23 / Explainability-for-Large-Language-Models
☆154Updated last year