ShuoTang123 / MATRIX
Implementation of the MATRIX framework (ICML 2024)
☆51Updated last year
Alternatives and similar repositories for MATRIX:
Users that are interested in MATRIX are comparing it to the libraries listed below
- ☆25Updated 11 months ago
- ☆43Updated 3 months ago
- ☆36Updated 7 months ago
- ☆106Updated this week
- ☆32Updated 6 months ago
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆76Updated last month
- The repository of the paper "REEF: Representation Encoding Fingerprints for Large Language Models," aims to protect the IP of open-source…☆43Updated 3 months ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆21Updated 2 months ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆76Updated 8 months ago
- Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"☆16Updated this week
- Code for "A Sober Look at Progress in Language Model Reasoning" paper☆45Updated this week
- ☆45Updated 6 months ago
- Official repository for 'Safety Challenges in Large Reasoning Models: A Survey' - Exploring safety risks, attacks, and defenses for Large…☆27Updated last week
- Pytorch implementation of Tree Preference Optimization (TPO) (Accepyed by ICLR'25)☆17Updated 2 weeks ago
- ☆42Updated 6 months ago
- Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety☆37Updated 2 months ago
- ☆10Updated 2 weeks ago
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆171Updated 3 months ago
- ☆59Updated 3 weeks ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆67Updated 2 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆119Updated last month
- This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturba…☆25Updated last month
- ☆56Updated last week
- This is the official implementation of the paper "S²R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning"☆62Updated 2 weeks ago
- [ICLR 2025] Official codebase for the ICLR 2025 paper "Multimodal Situational Safety"☆14Updated 2 months ago
- Official Implementation for EMNLP 2024 (main) "AgentReview: Exploring Academic Peer Review with LLM Agent."☆57Updated 6 months ago
- RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024☆73Updated 7 months ago
- [ACL'24] Chain of Thought (CoT) is significant in improving the reasoning abilities of large language models (LLMs). However, the correla…☆46Updated 2 months ago
- [ACL'24, Outstanding Paper] Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!☆36Updated 9 months ago
- ☆28Updated 10 months ago