avalonstrel / Mitigating-the-Alignment-Tax-of-RLHFView external linksLinks
☆16Feb 8, 2024Updated 2 years ago
Alternatives and similar repositories for Mitigating-the-Alignment-Tax-of-RLHF
Users that are interested in Mitigating-the-Alignment-Tax-of-RLHF are comparing it to the libraries listed below
Sorting:
- ☆24Aug 7, 2025Updated 6 months ago
- This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturba…☆36Mar 22, 2025Updated 10 months ago
- Code for “SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation(ICLR 2025)”☆24Oct 23, 2025Updated 3 months ago
- Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning☆26Oct 4, 2025Updated 4 months ago
- Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"☆21Jan 31, 2026Updated 2 weeks ago
- ☆22Feb 26, 2024Updated last year
- <혼자 만들면서 공부하는 파이썬> 책의 깃허브 자료실☆14Jan 14, 2026Updated last month
- ☆24Dec 8, 2024Updated last year
- Official repo of Progressive Data Expansion: data, code and evaluation☆29Nov 16, 2023Updated 2 years ago
- Official code of paper "Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models"☆86May 27, 2025Updated 8 months ago
- ☆35Sep 13, 2023Updated 2 years ago
- [COLM 2025] SEAL: Steerable Reasoning Calibration of Large Language Models for Free☆52Apr 6, 2025Updated 10 months ago
- LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters☆48Aug 2, 2025Updated 6 months ago
- ☆51Oct 23, 2023Updated 2 years ago
- ☆14Mar 7, 2025Updated 11 months ago
- Token-free Language Modeling with ByGPT5 & Friends!☆12Jul 18, 2025Updated 6 months ago
- 깃헙에 NLP 잔디심기 시즌 5☆10Aug 19, 2024Updated last year
- ☆10Jul 6, 2023Updated 2 years ago
- [AAMAS 2025] Privacy-preserving and Personalized RLHF, with convergence guarantees. The Code contains experiments for training multiple i…☆14Apr 16, 2025Updated 10 months ago
- ☆13Aug 28, 2024Updated last year
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated last year
- 한국어 소설 텍스트를 위한 자연어처리 라이브러리입니다. Natural Language Processing Library for Korean Literary Text. (Will be open in February, 2024)☆11Jan 16, 2024Updated 2 years ago
- Identification of the Adversary from a Single Adversarial Example (ICML 2023)☆10Jul 15, 2024Updated last year
- CoCoFL: Communication- and Computation-Aware Federated Learning via Partial NN Freezing and Quantization☆13Aug 3, 2024Updated last year
- [EMNLP 2024] To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models☆47Jan 23, 2025Updated last year
- a robust metric (robust fidelity) for XGNN (ICLR24)☆12Jun 3, 2025Updated 8 months ago
- Improving Symbolic Music Generation with Inference-Time Alignment☆20Aug 2, 2025Updated 6 months ago
- Code for the paper "SizeShiftReg: a Regularization Method for Improving Size-Generalization in Graph Neural Networks"☆12Jan 17, 2023Updated 3 years ago
- Code for paper "Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion"☆14Mar 28, 2024Updated last year
- Official repository for Trustworthy Alignment of Retrieval-Augmented Large Language Models via Reinforcement Learning☆12Sep 2, 2024Updated last year
- [TACL 2024] Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis☆11Nov 14, 2024Updated last year
- ☆14Feb 26, 2025Updated 11 months ago
- [AAAI26] Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilitie…☆10Feb 7, 2026Updated last week
- Speech Security and Privacy Compendium - Mini☆10Jun 18, 2024Updated last year
- The 4th rank system of the SemEval 2021 Task4.☆10May 7, 2022Updated 3 years ago
- Applications for OpenCL testing on Toradex Apalis iMX6Q☆12Dec 2, 2022Updated 3 years ago
- [NeurIPS 2025@FoRLM] R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search☆17Jan 24, 2026Updated 3 weeks ago
- 《7가지 프로젝트로 배우는 LLM AI 에이전트 개발》 추가 지원 저장소☆14Apr 1, 2025Updated 10 months ago
- awesome-audio-visual-robustness☆11Jan 27, 2024Updated 2 years ago