safety-research/selective-gradient-masking

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/safety-research/selective-gradient-masking)

safety-research / selective-gradient-masking

Training Transformers with knowledge localization (SGTM)

☆54

Alternatives and similar repositories for selective-gradient-masking

Users that are interested in selective-gradient-masking are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

licong-lin / negative-preference-optimization
View on GitHub
☆76Jul 15, 2024Updated 2 years ago
UCSB-NLP-Chang / ULD
View on GitHub
Implementation of paper 'Reversing the Forget-Retain Objectives: An Efficient LLM Unlearning Framework from Logit Difference' [NeurIPS'24…
☆26Jun 14, 2024Updated 2 years ago
cassidylaidlaw / orpo
View on GitHub
☆24Nov 11, 2024Updated last year
anthropic-experimental / automated-auditing
View on GitHub
Prompts used in the Automated Auditing Blog Post
☆167Jul 24, 2025Updated last year
xsddys / TRACE
View on GitHub
TRACE, a framework for turn-aware credit assignment for multi-turn jailbreak optimization
☆19Jun 22, 2026Updated last month
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
watcl-lab / positional_attention
View on GitHub
Source code for the paper "Positional Attention: Expressivity and Learnability of Algorithmic Computation"
☆14May 26, 2025Updated last year
shengliu66 / FractionalReason
View on GitHub
Official github repo for "Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute"
☆17Jun 30, 2025Updated last year
znowu / CliqueFlowmer
View on GitHub
Code with CliqueFlowmer model for Optimal Computational Materials Discovery
☆17Apr 21, 2026Updated 3 months ago
aisa-group / promptinject-agent-skills
View on GitHub
Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections
☆21Jul 2, 2026Updated 3 weeks ago
aHapBean / xHC
View on GitHub
[Tech Report] Expanded Hyper-Connections
☆49Updated this week
goombalab / Gather-and-Aggregate
View on GitHub
Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"
☆16Apr 30, 2025Updated last year
safety-research / open-source-alignment-faking
View on GitHub
Open Source Replication of Anthropic's Alignment Faking Paper
☆58Apr 4, 2025Updated last year
CaoYuanpu / BiPO
View on GitHub
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
☆50Jul 28, 2024Updated last year
kethan-1818 / 5G-channel-modulation-using-RL
View on GitHub
I have developed a custom environment using OpenAI Gym in Python for simulating a 5G wireless communication channel as part of a reinforc…
☆14Mar 27, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
PierreEtienneJ / twitter-graph-auto-report
View on GitHub
A python script to write a report automatically in docx for a twitter-graph
☆14Apr 14, 2022Updated 4 years ago
jsikyoon / V-MPO_torch
View on GitHub
V-MPO torch version with DMLab30 and GTrXL
☆13Mar 1, 2021Updated 5 years ago
ulab-uiuc / Multi-agent-evolve
View on GitHub
☆153Jan 21, 2026Updated 6 months ago
AiltonOliveir / RL-env-for-communications
View on GitHub
Reinforcement learning environment for MIMO communications.
☆15Jul 2, 2021Updated 5 years ago
chaufanglin / Normal2Whisper
View on GitHub
Implementation of "Improving Whispered Speech Recognition Performance using Pseudo-whispered based Data Augmentation"
☆14Oct 31, 2024Updated last year
zhaochen0110 / Timo
View on GitHub
Code and data for "Timo: Towards Better Temporal Reasoning for Language Models" (COLM 2024)
☆26Oct 23, 2024Updated last year
kyegomez / Reka-Torch
View on GitHub
Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch
☆29Updated this week
tag-and-generate / Politeness-Transfer-A-Tag-and-Generate-Approach
View on GitHub
☆24Mar 3, 2021Updated 5 years ago
tonyzhao-jt / LLM-PQ
View on GitHub
Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …
☆39Aug 29, 2025Updated 10 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
godaai / llm-inference
View on GitHub
Resources for Large Language Model Inference
☆17Dec 29, 2023Updated 2 years ago
muellerzr / import-timer
View on GitHub
Pragmatic approach to parsing import profiles for CI's
☆12Jul 1, 2024Updated 2 years ago
chentong0 / rl-binary-rar
View on GitHub
Official repo for "Binary Retrieval-augmented Reward Mitigates Hallucinations"
☆15Nov 13, 2025Updated 8 months ago
CSIPlab / SLUG
View on GitHub
Official repository for Targeted Unlearning with Single Layer Unlearning Gradient (SLUG), ICML 2025
☆18Aug 10, 2025Updated 11 months ago
bluvolve-dev / reactive-course-service-with-nextjs-ui-
View on GitHub
☆11Oct 15, 2020Updated 5 years ago
ariahw / rl-rewardhacking
View on GitHub
☆44Feb 18, 2026Updated 5 months ago
edward-playground / aidefend-mcp
View on GitHub
AIDEFEND MCP is a local-first AI Security Defensive Assistant that brings the full AIDEFEND countermeasure library into your environment …
☆19Updated this week
LUMIA-Group / ConceptLM
View on GitHub
Official Implementation of ConceptLM.
☆23Mar 18, 2026Updated 4 months ago
dame-cell / Triformer
View on GitHub
Transformers components but in Triton
☆34May 9, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
SeunggeunKimkr / PRISM
View on GitHub
[ICML 2026] Public repository for fine-tuning Masked Diffusion Models toward provable self-correction.
☆26Jul 5, 2026Updated 3 weeks ago
wckdman-zz / jigsaw-toxic-pytorch
View on GitHub
PyTorch pipeline for Kaggle Jigsaw Toxic Comment Classification Challenge
☆11Mar 23, 2018Updated 8 years ago
MasterZhou1 / Reasoning-Flow
View on GitHub
Code for Paper "The Geometry of Reasoning: Flowing Logics in Representation Space" (ICLR 2026)
☆60Jan 31, 2026Updated 5 months ago
wassname / rl_2d_walker.js
View on GitHub
Teaching a humanoid to walk(ish), then displaying in your browser (using tensorflow.js and reinforcement learning)
☆10Sep 7, 2020Updated 5 years ago
rynewu224 / GraphDA
View on GitHub
Unsupervised Domain Adaptation on Graphs
☆15Apr 6, 2022Updated 4 years ago
ssmisya / VLMLT
View on GitHub
[CVPR' 25] Official repo for From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Cal…
☆22Jun 6, 2025Updated last year
parameterlab / dr-llm
View on GitHub
[ICLR 2026 🔥] Dr.LLM: Dynamic Layer Routing in LLMs
☆56Apr 24, 2026Updated 3 months ago