ZhentingWang / DUMP
☆16Updated last week
Alternatives and similar repositories for DUMP
Users that are interested in DUMP are comparing it to the libraries listed below
Sorting:
- ☆14Updated 2 months ago
- An official implementation of "Catastrophic Failure of LLM Unlearning via Quantization" (ICLR 2025)☆27Updated 2 months ago
- ☆29Updated 10 months ago
- Code for "A Sober Look at Progress in Language Model Reasoning" paper☆45Updated this week
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆44Updated last month
- ☆31Updated last year
- [ICML 2023] "Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?" by Ruisi Cai, Zhenyu Zhang, Zhangyang Wang☆16Updated 2 years ago
- ☆31Updated 4 months ago
- ☆15Updated last month
- ☆25Updated 11 months ago
- AIR-Bench 2024 is a safety benchmark that aligns with emerging government regulations and company policies☆20Updated 9 months ago
- ☆21Updated last month
- ☆19Updated 10 months ago
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Updated last year
- Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"☆10Updated last month
- What Makes a Reward Model a Good Teacher? An Optimization Perspective☆28Updated last month
- ☆27Updated last year
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆30Updated 3 months ago
- ☆14Updated last year
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆15Updated 2 months ago
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆15Updated last month
- Exploration of automated dataset selection approaches at large scales.☆40Updated 2 months ago
- Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆15Updated 2 months ago
- Codebase for decoding compressed trust.☆23Updated last year
- ☆20Updated 2 months ago
- ☆16Updated 8 months ago
- Official repository for paper "DeepCritic: Deliberate Critique with Large Language Models"☆19Updated last week
- ☆54Updated 2 years ago
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆58Updated last month
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆78Updated 6 months ago