Glaciohound/LM-Steer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Glaciohound/LM-Steer)

Glaciohound / LM-Steer

Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)

☆149

Alternatives and similar repositories for LM-Steer

Users that are interested in LM-Steer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zhliu0106 / learning-to-refuse
View on GitHub
Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"
☆10Dec 13, 2024Updated last year
Glaciohound / LM-Infinite
View on GitHub
Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
☆152Mar 13, 2025Updated last year
EnnengYang / Efficient-WEMoE
View on GitHub
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.
☆16Oct 28, 2024Updated last year
HelloEveryboby / Butler
View on GitHub
Butler 是一个用于自动化服务管理和任务调度的工具项目。
☆17Jun 23, 2026Updated 2 weeks ago
nrimsky / CAA
View on GitHub
Steering Llama 2 with Contrastive Activation Addition
☆240May 23, 2024Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
chrisliu298 / awesome-representation-engineering
View on GitHub
A resource repository for representation engineering in large language models
☆155Nov 14, 2024Updated last year
AI21Labs / factor
View on GitHub
Code and data for the FACTOR paper
☆54Nov 15, 2023Updated 2 years ago
nju-websoft / MAGIC
View on GitHub
Multi-Aspect Controllable Text Generation with Disentangled Counterfactual Augmentation, ACL 2024 (main)
☆14Sep 23, 2024Updated last year
boyiwei / CoTaEval
View on GitHub
[NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models
☆17Jul 17, 2024Updated last year
princeton-nlp / LLMBar
View on GitHub
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
☆138Jul 8, 2024Updated 2 years ago
OPTML-Group / WAGLE
View on GitHub
Official repo for NeurIPS'24 paper "WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models"
☆19Dec 16, 2024Updated last year
1andrevich / antifilter-domain
View on GitHub
Generated geosite.dat based on Antifilter Community List
☆28Updated this week
ShujinWu-0814 / MACAROON
View on GitHub
Public code repo for EMNLP 2024 Findings paper "MACAROON: Training Vision-Language Models To Be Your Engaged Partners"
☆14Sep 28, 2024Updated last year
NJUPT-SAST / aurora-ui
View on GitHub
🌏 UI component library for the future, based on WebComponent.
☆23Nov 12, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
cisnlp / MEXA
View on GitHub
[ACL 2025] 🔍 Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment
☆11Apr 6, 2025Updated last year
paul-rottger / xstest
View on GitHub
Röttger et al. (NAACL 2024): "XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models"
☆137Feb 24, 2025Updated last year
sunblaze-ucb / rl-grok-recipe
View on GitHub
Code repository for "RL Grokking Recipe: How RL Unlocks and Transfers New Algorithms in LLMs""
☆35Oct 12, 2025Updated 8 months ago
lyh6560new / P3Sum
View on GitHub
The offical code for paper "What Constitutes a Faithful Summary? Preserving Author Perspectives in News Summarization"
☆10Jun 23, 2024Updated 2 years ago
llm-misinformation / llm-misinformation
View on GitHub
The dataset and code for the ICLR 2024 paper "Can LLM-Generated Misinformation Be Detected?"
☆85Nov 9, 2024Updated last year
ethz-spylab / unlearning-vs-safety
View on GitHub
☆27Oct 6, 2024Updated last year
zepingyu0512 / neuron-attribution
View on GitHub
code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models
☆52Nov 17, 2024Updated last year
qinyiwei / InfoBench
View on GitHub
☆61Aug 22, 2024Updated last year
CaoYuanpu / BiPO
View on GitHub
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
☆48Jul 28, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
ydyjya / SafetyHeadAttribution
View on GitHub
☆70Jun 1, 2025Updated last year
HKUNLP / multilingual-transfer
View on GitHub
Code for paper ”Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability“
☆15Jun 13, 2023Updated 3 years ago
hkust-nlp / Activation_Decoding
View on GitHub
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆63Mar 30, 2024Updated 2 years ago
steering-vectors / steering-vectors
View on GitHub
Steering vectors for transformer language models in Pytorch / Huggingface
☆156Feb 21, 2025Updated last year
pillowsofwind / Knowledge-Conflicts-Survey
View on GitHub
[EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"
☆159Sep 21, 2024Updated last year
shangshang-wang / Resa
View on GitHub
Resa: Transparent Reasoning Models via SAEs
☆49Sep 23, 2025Updated 9 months ago
Re-Align / AlignTDS
View on GitHub
Analyzing LLM Alignment via Token distribution shift
☆17Jan 26, 2024Updated 2 years ago
amazon-science / bold
View on GitHub
Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper
☆88Mar 2, 2021Updated 5 years ago
shadowkiller33 / Language_attack
View on GitHub
A repo for LLM jailbreak
☆14Sep 5, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
yikee / ScienceMeter
View on GitHub
ScienceMeter: Tracking Scientific Knowledge Updates in Language Models
☆17Jun 28, 2025Updated last year
OPTML-Group / SOUL
View on GitHub
Official repo for EMNLP'24 paper "SOUL: Unlocking the Power of Second-Order Optimization for LLM Unlearning"
☆30Oct 1, 2024Updated last year
msclar / symmtom
View on GitHub
Code for the paper "Symmetric Machine Theory of Mind", presented at ICML 2022.
☆12Jul 18, 2022Updated 3 years ago
amazon-science / controllable-readability-summarization
View on GitHub
Generating Summaries with Controllable Readability Levels (EMNLP 2023)
☆15Jul 2, 2026Updated last week
vyomakesh09 / longagent
View on GitHub
LONGAGENT: Scaling Language Models to 128k Context through Multi-Agent Collaboration
☆11Mar 11, 2024Updated 2 years ago
AlphaLab-USTC / AlphaSteer
View on GitHub
[ICLR 2026] The implementation of paper "AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint"
☆61Nov 20, 2025Updated 7 months ago
git-disl / Vaccine
View on GitHub
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
☆51Jan 15, 2026Updated 5 months ago