guyuntian / CoT_benchmark
Code for "Towards Revealing the Mystery behind Chain of Thought: a Theoretical Perspective"
☆18Updated last year
Alternatives and similar repositories for CoT_benchmark:
Users that are interested in CoT_benchmark are comparing it to the libraries listed below
- Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".☆18Updated 3 months ago
- Lightweight Adapting for Black-Box Large Language Models☆19Updated last year
- [ACL'24, Outstanding Paper] Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!☆33Updated 6 months ago
- ☆34Updated last year
- ☆30Updated 4 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆49Updated 4 months ago
- Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation, ICML 2024☆21Updated 7 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆42Updated 6 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆71Updated 7 months ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆66Updated 6 months ago
- ThinK: Thinner Key Cache by Query-Driven Pruning☆15Updated last week
- GenRM-CoT: Data release for verification rationales☆47Updated 4 months ago
- ☆25Updated 9 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆115Updated 5 months ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆23Updated 5 months ago
- The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”☆15Updated 11 months ago
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆12Updated 8 months ago
- ☆37Updated last year
- ☆27Updated 3 months ago
- ☆49Updated last year
- [ICLR 2025] Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization☆19Updated 3 weeks ago
- This is the official repo for Towards Uncertainty-Aware Language Agent.☆24Updated 6 months ago
- ☆25Updated last year
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆31Updated 7 months ago
- Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]☆17Updated 9 months ago
- ☆14Updated 11 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆39Updated 3 months ago
- Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).☆15Updated last month
- Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts☆16Updated 11 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆62Updated 3 months ago