microsoft / ValueCompassLinks

☆29

Alternatives and similar repositories for ValueCompass

Users that are interested in ValueCompass are comparing it to the libraries listed below

Sorting:

AmourWaltz / Reliable-LLM
☆174Updated last year
OpenSafetyLab / SALAD-BENCH
【ACL 2024】 SALAD benchmark & MD-Judge
☆166Updated 8 months ago
ydyjya / LLM-IHS-Explanation
☆55Updated last year
niconi19 / LLM-Conversation-Safety
[NAACL2024] Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
☆107Updated last year
zepingyu0512 / awesome-LLM-neuron
☆33Updated 5 months ago
OSU-NLP-Group / AgentSafety
☆139Updated last month
Hongcheng-Gao / Awesome-Long2short-on-LRMs
Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…
☆255Updated 3 months ago
chujiezheng / LLM-Safeguard
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"
☆100Updated 6 months ago
PKU-Alignment / beavertails
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
☆167Updated 2 years ago
pillowsofwind / Knowledge-Conflicts-Survey
[EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"
☆148Updated last year
shizhl / Multi-Agent-Papers
The awesome agents in the era of large language models
☆69Updated 2 years ago
SeekingDream / Static-to-Dynamic-LLMEval
The official GitHub repository of the paper "Recent advances in large langauge model benchmarks against data contamination: From static t…
☆47Updated 2 months ago
kevinyaobytedance / llm_unlearn
LLM Unlearning
☆177Updated 2 years ago
isXinLiu / MM-SafetyBench
Accepted by ECCV 2024
☆177Updated last year
Jihuai-wpy / InferAligner
☆37Updated last year
hzy312 / Awesome-LLM-Watermark
UP-TO-DATE LLM Watermark paper. 🔥🔥🔥
☆365Updated 11 months ago
cooperleong00 / Awesome-LLM-Interpretability
A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..
☆285Updated 8 months ago
Xianjun-Yang / Awesome_papers_on_LLMs_detection
The lastest paper about detection of LLM-generated text and code
☆281Updated 5 months ago
zhenyu-02 / LogitLens4LLMs
A versatile toolkit for applying Logit Lens to modern large language models (LLMs). Currently supports Llama-3.1-8B and Qwen-2.5-7B, enab…
☆130Updated 3 months ago
TrustGen / TrustEval-toolkit
Toolkit for evaluating the trustworthiness of generative foundation models.
☆123Updated 3 months ago
Arstanley / Awesome-Trustworthy-RAG
☆92Updated 4 months ago
Lordog / R-Judge
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)
☆92Updated 6 months ago
wonderNefelibata / Awesome-LRM-Safety
Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …
☆78Updated this week
yubol-bobo / Awesome-Multi-Turn-LLMs
This is the official GitHub repository for our survey paper "Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language …
☆143Updated 6 months ago
zepingyu0512 / awesome-SAE
awesome SAE papers
☆60Updated 6 months ago
LuckyyySTA / Awesome-LLM-hallucination
LLM hallucination paper list
☆327Updated last year
zchuz / CoT-Reasoning-Survey
[ACL 2024] A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
☆470Updated 10 months ago
thu-coai / SafetyBench
Official github repo for SafetyBench, a comprehensive benchmark to evaluate LLMs' safety. [ACL 2024]
☆264Updated 4 months ago
Unispac / shallow-vs-deep-alignment
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆165Updated 7 months ago
zepingyu0512 / awesome-llm-understanding-mechanism
awesome papers in LLM interpretability
☆584Updated 3 months ago