GAIR-NLP / Safety-JLinks

Safety-J: Evaluating Safety with Critique

☆16

Alternatives and similar repositories for Safety-J

Users that are interested in Safety-J are comparing it to the libraries listed below

Sorting:

GAIR-NLP / BeHonest
BeHonest: Benchmarking Honesty in Large Language Models
☆34Updated last year
GAIR-NLP / alignment-for-honesty
☆75Updated last year
GAIR-NLP / self-improvement-reversal
☆13Updated last year
RUCAIBox / HaluEval-2.0
☆47Updated last year
qtli / GSM-Plus
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆63Updated last year
zzli2022 / TLDR
Code for Research Project TLDR
☆23Updated 3 months ago
ADaM-BJTU / W2SG
The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”
☆17Updated last year
ssmisya / PRMBench
[ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.
☆81Updated 8 months ago
princeton-nlp / MQuAKE
[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
☆117Updated last year
zhaochen0110 / conflictbank
Code and data for "ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM" (NeurIPS 2024 Track Datasets and…
☆52Updated 5 months ago
TianHongZXY / RLVR-Decomposed
[NeurIPS 2025] Implementation for the paper "The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning"
☆115Updated 2 months ago
GAIR-NLP / weak-to-strong-reasoning
☆58Updated last year
KbsdJames / Omni-MATH
The official repository of the Omni-MATH benchmark.
☆88Updated 10 months ago
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆73Updated 3 weeks ago
hanningzhang / prm
☆17Updated 11 months ago
zepingyu0512 / neuron-attribution
code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models
☆45Updated 11 months ago
SparkJiao / dpo-trajectory-reasoning
[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆82Updated 9 months ago
yuzhaouoe / SAE-based-representation-engineering
[NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
☆66Updated 11 months ago
PremiLab-Math / MathCheck
[ICLR 2025] Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist
☆33Updated last year
HillZhang1999 / ICD
Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"
☆69Updated last year
October2001 / ProLong
[ACL 2024 (Oral)] A Prospector of Long-Dependency Data for Large Language Models
☆57Updated last year
SeekingDream / Static-to-Dynamic-LLMEval
The official GitHub repository of the paper "Recent advances in large langauge model benchmarks against data contamination: From static t…
☆45Updated last month
rookie-joe / AutoPSV
☆50Updated last year
KbsdJames / omni-math-rule
The rule-based evaluation subset and code implementation of Omni-MATH
☆24Updated 10 months ago
F2-Song / ICDPO
The official implementation of "ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization…
☆16Updated last year
SihengLi99 / LLM-Honesty-Survey
[2025-TMLR] A Survey on the Honesty of Large Language Models
☆61Updated 10 months ago
MikaStars39 / FeatureAlignment
FeatureAlignment = Alignment + Mechanistic Interpretability
☆31Updated 7 months ago
NovelQA / novelqa.github.io
☆21Updated 10 months ago
OSU-NLP-Group / LLM-Knowledge-Conflict
[ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"
☆77Updated last year
OpenMOSS / Say-I-Dont-Know
[ICML'2024] Can AI Assistants Know What They Don't Know?
☆83Updated last year