[COLM 2025] SEAL: Steerable Reasoning Calibration of Large Language Models for Free
☆52Apr 6, 2025Updated 10 months ago
Alternatives and similar repositories for SEAL
Users that are interested in SEAL are comparing it to the libraries listed below
Sorting:
- [EMNLP 25] An effective and interpretable weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study un…☆17Dec 17, 2025Updated 2 months ago
- Code repository for "RL Grokking Recipe: How RL Unlocks and Transfers New Algorithms in LLMs""☆30Oct 12, 2025Updated 4 months ago
- ☆19Aug 4, 2025Updated 6 months ago
- ☆18Aug 19, 2024Updated last year
- DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails☆31Feb 26, 2025Updated last year
- Official implementation for "Mixture of In-Context Experts Enhance LLMs’ Awareness of Long Contexts" (Accepted by Neurips2024)☆13Jan 7, 2025Updated last year
- Inverse Scaling in Test-Time Compute☆25Dec 3, 2025Updated 2 months ago
- Official Repo of Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents☆59Oct 28, 2025Updated 4 months ago
- Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …☆82Updated this week
- Code for Reducing Hallucinations in Vision-Language Models via Latent Space Steering☆103Nov 23, 2024Updated last year
- [NeurIPS'22] Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork. Haotao Wang, Junyuan Hong,…☆15Nov 27, 2023Updated 2 years ago
- Improving Alignment and Robustness with Circuit Breakers☆258Sep 24, 2024Updated last year
- ☆16Feb 8, 2024Updated 2 years ago
- Preparing for ML Interviews.☆54Jan 12, 2026Updated last month
- [ICLR 2025] "Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond"☆17Feb 27, 2025Updated last year
- Official codes for "Understanding Deep Gradient Leakage via Inversion Influence Functions", NeurIPS 2023☆15Oct 13, 2023Updated 2 years ago
- An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace☆18Oct 21, 2024Updated last year
- Paper list for the paper "Authorship Attribution in the Era of Large Language Models: Problems, Methodologies, and Challenges (SIGKDD Exp…☆18Dec 23, 2024Updated last year
- This repo is for the safety topic, including attacks, defenses and studies related to reasoning and RL☆61Sep 5, 2025Updated 5 months ago
- This is the official code for our paper "Simple and Scalable Nearest Neighbor Machine Translation" (ICLR 2023).☆14Nov 22, 2023Updated 2 years ago
- This is the official repository for our NeurIPS'22 paper "Watermarking for Out-of-distribution Detection."☆18Feb 24, 2023Updated 3 years ago
- Private Adaptive Optimization with Side Information (ICML '22)☆16Jun 23, 2022Updated 3 years ago
- ☆19Jun 21, 2025Updated 8 months ago
- [SatML 2024] Shake to Leak: Fine-tuning Diffusion Models Can Amplify the Generative Privacy Risk☆16Mar 15, 2025Updated 11 months ago
- [ICLR'26 Oral] RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments☆34Feb 9, 2026Updated 3 weeks ago
- Code for the ICLR 2022 paper. Salient Imagenet: How to discover spurious features in deep learning?☆41Aug 19, 2022Updated 3 years ago
- ☆40Jun 11, 2025Updated 8 months ago
- Official implementation of the paper "Stochastic Optimization of Areas Under Precision-Recall Curves with Provable Convergence" published…☆21Oct 9, 2023Updated 2 years ago
- [ICLR 2022] Official Code Repository for "TRGP: TRUST REGION GRADIENT PROJECTION FOR CONTINUAL LEARNING"☆22Oct 5, 2022Updated 3 years ago
- [NeurIPS 2024] Can Language Models Learn to Skip Steps?☆22Jan 25, 2025Updated last year
- Github repo for NeurIPS 2024 paper "Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models"☆26Dec 21, 2025Updated 2 months ago
- A new algorithm that formulates jailbreaking as a reasoning problem.☆26Jul 2, 2025Updated 8 months ago
- [ICML2023] Revisiting Data-Free Knowledge Distillation with Poisoned Teachers☆23Jul 7, 2024Updated last year
- Representation Surgery for Multi-Task Model Merging. ICML, 2024.☆47Oct 10, 2024Updated last year
- Not All Poisons are Created Equal: Robust Training against Data Poisoning (ICML 2022)☆22Aug 8, 2022Updated 3 years ago
- ☆32Aug 9, 2024Updated last year
- Lightweight Adapting for Black-Box Large Language Models☆25Feb 15, 2024Updated 2 years ago
- [ICLR2026] "Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models"☆30Feb 4, 2026Updated 3 weeks ago
- [NeurIPS 2022] "Randomized Channel Shuffling: Minimal-Overhead Backdoor Attack Detection without Clean Datasets" by Ruisi Cai*, Zhenyu Zh…☆21Oct 1, 2022Updated 3 years ago