tmlr-group / NoisyRationalesLinks

[NeurIPS 2024] "Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales?"

☆37

Alternatives and similar repositories for NoisyRationales

Users that are interested in NoisyRationales are comparing it to the libraries listed below

Sorting:

tmlr-group / G-effect
[ICLR 2025] "Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond"
☆13Updated 8 months ago
tmlr-group / AR-Bench
[ICML 2025] "From Passive to Active Reasoning: Can Large Language Models Ask the Right Questions under Incomplete Information?"
☆47Updated last month
bethgelab / sober-reasoning
A Sober Look at Language Model Reasoning
☆87Updated this week
ChnQ / MI-Peaks
☆55Updated 4 months ago
boyiwei / alignment-attribution-code
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
☆86Updated 7 months ago
VITA-Group / SEAL
[COLM 2025] SEAL: Steerable Reasoning Calibration of Large Language Models for Free
☆45Updated 7 months ago
swj0419 / muse_bench
☆30Updated 8 months ago
hkust-nlp / Activation_Decoding
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆62Updated last year
Lingkai-Kong / RE-Control
Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective
☆34Updated 9 months ago
Alsace08 / Chain-of-Embedding
[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"
☆83Updated 11 months ago
jinhaoduan / SAR
[ACL 2024] Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models
☆59Updated last year
nik-dim / tall_masks
Official repository of "Localizing Task Information for Improved Model Merging and Compression" [ICML 2024]
☆51Updated last year
glorgao / SelectiveDPO
Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples
☆44Updated 4 months ago
deeplearning-wisc / haloscope
source code for NeurIPS'24 paper "HaloScope: Harnessing Unlabeled LLM Generations for Hallucination Detection"
☆61Updated 7 months ago
ZFancy / awesome-activation-engineering
A curated list of resources for activation engineering
☆111Updated last month
git-disl / Booster
This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturba…
☆33Updated 7 months ago
uw-nsl / safechain
[ACL 25] SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities
☆25Updated 7 months ago
EnnengYang / AdaMerging
AdaMerging: Adaptive Model Merging for Multi-Task Learning. ICLR, 2024.
☆96Updated last year
git-disl / Vaccine
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆47Updated last year
thu-ml / STAIR
Official codebase for "STAIR: Improving Safety Alignment with Introspective Reasoning"
☆85Updated 8 months ago
wonderNefelibata / Awesome-LRM-Safety
Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …
☆76Updated this week
WanliYoung / Revisit-Editing-Evaluation
Code and data repository for "The Mirage of Model Editing: Revisiting Evaluation in the Wild"
☆16Updated 2 months ago
princeton-nlp / benign-data-breaks-safety
☆41Updated last year
Persdre / NeurIPS-2024-LLM-Papers
Accepted LLM Papers in NeurIPS 2024
☆37Updated last year
Unispac / shallow-vs-deep-alignment
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆163Updated 6 months ago
ybwang119 / Awesome-reasoning-safety
This repo is for the safety topic, including attacks, defenses and studies related to reasoning and RL
☆52Updated 2 months ago
ZHZisZZ / modpo
[ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
☆92Updated last year
git-disl / Safety-Tax
This is the official code for the paper "Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable".
☆25Updated 8 months ago
AlexanderVNikitin / kernel-language-entropy
Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)
☆32Updated 11 months ago
Model-GLUE / Model-GLUE
☆18Updated last year