Jarviswang94 / Multilingual_safety_benchmarkLinks

Multilingual safety benchmark for Large Language Models

☆52

Alternatives and similar repositories for Multilingual_safety_benchmark

Users that are interested in Multilingual_safety_benchmark are comparing it to the libraries listed below

Sorting:

Jarviswang94 / MTTM
MTTM: Metamorphic Testing for Textual Content Moderation Software
☆32Updated 2 years ago
yxwan123 / BiasAsker
☆40Updated 9 months ago
CUHK-ARISE / EmotionBench
Benchmarking LLMs' Emotional Alignment with Humans
☆113Updated 8 months ago
pillowsofwind / Course-Correction
[EMNLP 2024] The official GitHub repo for the paper "Course-Correction: Safety Alignment Using Synthetic Preferences"
☆19Updated last year
HillZhang1999 / ICD
Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"
☆69Updated last year
CUHK-ARISE / PsychoBench
Benchmarking LLMs' Psychological Portrayal
☆124Updated 9 months ago
GAIR-NLP / BeHonest
BeHonest: Benchmarking Honesty in Large Language Models
☆34Updated last year
Zhou-Zoey / RMB-Reward-Model-Benchmark
☆42Updated 6 months ago
penguinnnnn / awesome-llm-and-society
Recent papers on (1) Psychology of LLMs; (2) Biases in LLMs.
☆49Updated last year
RUCAIBox / HaluEval-2.0
☆47Updated last year
OpenSafetyLab / SALAD-BENCH
【ACL 2024】 SALAD benchmark & MD-Judge
☆161Updated 7 months ago
GAIR-NLP / alignment-for-honesty
☆75Updated last year
zthang / Focus
☆20Updated last year
SparkJiao / dpo-trajectory-reasoning
[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆82Updated 8 months ago
SeekingDream / Static-to-Dynamic-LLMEval
The official GitHub repository of the paper "Recent advances in large langauge model benchmarks against data contamination: From static t…
☆45Updated last month
PKU-Alignment / beavertails
BeaverTails is a collection of datasets designed to facilitate research on safety alignment in large language models (LLMs).
☆160Updated last year
zepingyu0512 / neuron-attribution
code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models
☆43Updated 10 months ago
qtli / GSM-Plus
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆63Updated last year
KbsdJames / omni-math-rule
The rule-based evaluation subset and code implementation of Omni-MATH
☆23Updated 9 months ago
CUHK-ARISE / GAMABench
Benchmarking LLMs' Gaming Ability in Multi-Agent Environments
☆88Updated 5 months ago
SihengLi99 / LLM-Honesty-Survey
[2025-TMLR] A Survey on the Honesty of Large Language Models
☆59Updated 10 months ago
pkunlp-icler / IKE
☆26Updated 2 years ago
alisawuffles / proxy-tuning
Code associated with Tuning Language Models by Proxy (Liu et al., 2024)
☆120Updated last year
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆69Updated this week
CriticBench / CriticBench
[ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
☆27Updated last year
lancopku / label-words-are-anchors
Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning
☆165Updated last year
FreedomIntelligence / OVM
☆69Updated last year
chujiezheng / LLM-Safeguard
Official repository for ICML 2024 paper "On Prompt-Driven Safeguarding for Large Language Models"
☆97Updated 4 months ago
jinzhuoran / RWKU
RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024
☆83Updated last year
OpenMOSS / Say-I-Dont-Know
[ICML'2024] Can AI Assistants Know What They Don't Know?
☆83Updated last year