Personality Alignment of Language Models
☆53Jul 1, 2025Updated 8 months ago
Alternatives and similar repositories for PAlign
Users that are interested in PAlign are comparing it to the libraries listed below
Sorting:
- Your finetuned model's back to its original safety standards faster than you can say "SafetyLock"!☆11Oct 16, 2024Updated last year
- ControlLM is a method to control the personality traits and behaviors of language models in real-time at inference without costly trainin…☆19Nov 6, 2024Updated last year
- ☆10Jul 5, 2023Updated 2 years ago
- Code for paper "Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion"☆14Mar 28, 2024Updated last year
- Code for LLM_Catastrophic_Forgetting via SAM.☆11Jun 7, 2024Updated last year
- [ICLR 2025] On Evluating the Durability of Safegurads for Open-Weight LLMs☆13Jun 20, 2025Updated 8 months ago
- [AAAI 2024] MELO: Enhancing Model Editing with Neuron-indexed Dynamic LoRA☆27Apr 9, 2024Updated last year
- ☆12May 6, 2024Updated last year
- This repo is to demo the concept of lossless compression with Transformers as encoder and decoder.☆14May 2, 2024Updated last year
- Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.☆18Jan 14, 2025Updated last year
- Code for the paper "Self-Detoxifying Language Models via Toxification Reversal" (EMNLP 2023)☆18Oct 17, 2023Updated 2 years ago
- Codes for Mitigating Unhelpfulness in Emotional Support Conversations with Multifaceted AI Feedback (ACL 2024 Findings)☆16Jul 2, 2024Updated last year
- (ACL2025 Findings) Official code for the paper "STeCa: Step-level Trajectory Calibration for LLM Agent Learning"☆25Updated this week
- Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"☆22Sep 21, 2025Updated 5 months ago
- Official implementation for "ALI-Agent: Assessing LLMs'Alignment with Human Values via Agent-based Evaluation"☆21Jan 31, 2026Updated last month
- Codebase for "On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback". This repo implements a generative multi-tur…☆23Dec 3, 2024Updated last year
- Matching Natural Language Sentences with Hierarchical Sentence Factorization☆22Apr 26, 2018Updated 7 years ago
- Code and data for the paper: On the Reliability of Psychological Scales on Large Language Models☆30Dec 15, 2025Updated 2 months ago
- MiSS is a novel PEFT method that features a low-rank structure but introduces a new update mechanism distinct from LoRA, achieving an exc…☆31Jan 28, 2026Updated last month
- Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts☆25Feb 23, 2024Updated 2 years ago
- ☆24Dec 8, 2024Updated last year
- This repo contains code for our NeurIPS 2023 spotlight paper: Evaluating and Inducing Personality in Pre-trained Language Models☆57Dec 7, 2023Updated 2 years ago
- [ACL 2024 main] Aligning Large Language Models with Human Preferences through Representation Engineering (https://aclanthology.org/2024.…☆28Sep 25, 2024Updated last year
- Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue (ACL 2024)☆25Oct 18, 2025Updated 4 months ago
- ☆28Jan 16, 2025Updated last year
- ☆30Jul 22, 2024Updated last year
- This repository provides the data and the codes used in the AAAI'24 paper, COOPER: Coordinating Specialized Agents towards a Complex Dial…☆27Mar 1, 2024Updated 2 years ago
- Code of ACL 2022 paper Debiased Contrastive Learning of Unsupervised Sentence Representations☆32Mar 16, 2022Updated 3 years ago
- Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation (EMNLP 2023)☆31Oct 18, 2025Updated 4 months ago
- Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.☆31Dec 6, 2023Updated 2 years ago
- PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)☆30Dec 12, 2024Updated last year
- Official Implementation for the paper "Integrative Decoding: Improving Factuality via Implicit Self-consistency"☆32Apr 12, 2025Updated 10 months ago
- A tool to generate optimized hardware files for univariate functions.☆15Sep 5, 2024Updated last year
- Sotopia-π: Interactive Learning of Socially Intelligent Language Agents (ACL 2024)☆81May 7, 2024Updated last year
- Superposition Yields Robust Neural Scaling☆58Feb 12, 2026Updated 3 weeks ago
- repository for CharacterChat, a personalized social support system☆76Jul 13, 2024Updated last year
- Official code of paper "Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models"☆86May 27, 2025Updated 9 months ago
- PLATO dialog model with pre-trained parameters in pytorch version☆29May 20, 2022Updated 3 years ago
- Official completion of “Training on the Benchmark Is Not All You Need”.☆39Dec 31, 2024Updated last year