declare-lab / trust-alignLinks

Codes and datasets for the paper Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

☆68

Alternatives and similar repositories for trust-align

Users that are interested in trust-align are comparing it to the libraries listed below

Sorting:

clinicalml / co-llm
Co-LLM: Learning to Decode Collaboratively with Multiple Language Models
☆123Updated last year
zjunlp / KnowledgeCircuits
[NeurIPS 2024] Knowledge Circuits in Pretrained Transformers
☆159Updated 3 weeks ago
MingLiiii / Layer_Gradient
[ACL'25 Oral] What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
☆74Updated 5 months ago
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆50Updated 9 months ago
Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆112Updated 4 months ago
kevinwu23 / StanfordClashEval
☆37Updated 10 months ago
SalesforceAIResearch / FaithEval
☆54Updated 3 weeks ago
ReasoningTransfer / Transferability-of-LLM-Reasoning
☆104Updated last month
tianyang-x / SaySelf
Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"
☆109Updated last year
voidism / Lookback-Lens
Code for the EMNLP 2024 paper "Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps"
☆142Updated last month
weizhepei / InstructRAG
[ICLR 2025] InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales
☆132Updated 10 months ago
stanfordnlp / axbench
Stanford NLP Python library for benchmarking the utility of LLM interpretability methods
☆150Updated 5 months ago
zjunlp / unlearn
[ACL 2025] Knowledge Unlearning for Large Language Models
☆46Updated 2 months ago
WindyLee0822 / Process_Q_Model
official implementation of paper "Process Reward Model with Q-value Rankings"
☆65Updated 10 months ago
shengliu66 / ICV
Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
☆193Updated 9 months ago
YangLing0818 / SuperCorrect-llm
[ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
☆83Updated 8 months ago
LAMDASZ-ML / Self-Backtracking
☆51Updated 9 months ago
facebookresearch / ReasonIR
Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".
☆209Updated 5 months ago
Glaciohound / LM-Steer
Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)
☆130Updated 4 months ago
ScalerLab / JudgeBench
☆105Updated last year
fangyuan-ksgk / CoT-Reasoning-without-Prompting
Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting
☆34Updated last year
Dereck0602 / Awesome_Test_Time_LLMs
☆134Updated 8 months ago
google-deepmind / bbeh
☆105Updated 7 months ago
HKUNLP / critic-rl
[ICML 2025] Teaching Language Models to Critique via Reinforcement Learning
☆118Updated 7 months ago
THU-KEG / Agentic-Reward-Modeling
[ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
☆115Updated 5 months ago
TIGER-AI-Lab / CritiqueFineTuning
Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate" [COLM 2025]
☆179Updated 5 months ago
TsinghuaC3I / SSRL
SSRL: Self-Search Reinforcement Learning
☆157Updated 3 months ago
DataArcTech / LLM-as-a-Judge
☆158Updated last month
yale-nlp / MCTS-RAG
Data and Code for EMNLP 2025 Findings Paper "MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search"
☆82Updated last month
Anni-Zou / DocBench
DocBench: A Benchmark for Evaluating LLM-based Document Reading Systems
☆59Updated last year