zhrli324 / RLEditLinks

☆13

Alternatives and similar repositories for RLEdit

Users that are interested in RLEdit are comparing it to the libraries listed below

Sorting:

Unispac / shallow-vs-deep-alignment
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
☆142Updated 2 months ago
ChnQ / MI-Peaks
☆42Updated this week
wonderNefelibata / Awesome-LRM-Safety
Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …
☆65Updated this week
ZFancy / awesome-activation-engineering
A curated list of resources for activation engineering
☆95Updated last month
yaojin17 / Unlearning_LLM
[ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"
☆59Updated 9 months ago
shengliu66 / VTI
Code for Reducing Hallucinations in Vision-Language Models via Latent Space Steering
☆65Updated 7 months ago
itsqyh / Awesome-LMMs-Mechanistic-Interpretability
A curated collection of resources focused on the Mechanistic Interpretability (MI) of Large Multimodal Models (LMMs). This repository agg…
☆105Updated 3 weeks ago
VITA-Group / SEAL
Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Free
☆30Updated 3 months ago
Alsace08 / Chain-of-Embedding
[ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"
☆68Updated 6 months ago
jianghoucheng / AlphaEdit
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)
☆282Updated last week
EIT-NLP / Awesome-Latent-CoT
This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.
☆134Updated this week
boyiwei / alignment-attribution-code
[ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
☆80Updated 3 months ago
licong-lin / negative-preference-optimization
☆60Updated last year
DripNowhy / ETA
[ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"
☆24Updated 3 weeks ago
StarDewXXX / Awesome-Hybrid-CoT-Reasoning
☆51Updated last month
TianyunYoung / Hallucination-Attribution
This repo contains the code for the paper "Understanding and Mitigating Hallucinations in Large Vision-Language Models via Modular Attrib…
☆22Updated this week
LzVv123456 / VISTA
☆45Updated last month
AlexanderVNikitin / kernel-language-entropy
Code for Fine-grained Uncertainty Quantification for LLMs from Semantic Similarities (NeurIPS'24)
☆24Updated 7 months ago
KID-22 / LLM-Unlearning-Paper-List
☆28Updated last year
git-disl / awesome_LLM-harmful-fine-tuning-papers
A survey on harmful fine-tuning attack for large language model
☆193Updated 2 weeks ago
WangCheng0116 / Awesome-LRMs-Safety
Official repository for "Safety in Large Reasoning Models: A Survey" - Exploring safety risks, attacks, and defenses for Large Reasoning …
☆60Updated last month
VLKEB / VLKEB
☆55Updated 8 months ago
ydyjya / SafetyHeadAttribution
☆31Updated last month
TrustedLLM / UnKE
☆19Updated 5 months ago
AI45Lab / VLSBench
[ACL 2025] Data and Code for Paper VLSBench: Unveiling Visual Leakage in Multimodal Safety
☆48Updated 2 months ago
swj0419 / muse_bench
☆22Updated 4 months ago
wang2226 / Awesome-LLM-Decoding
📜 Paper list on decoding methods for LLMs and LVLMs
☆52Updated 2 weeks ago
isXinLiu / MM-SafetyBench
Accepted by ECCV 2024
☆142Updated 9 months ago
Jihuai-wpy / InferAligner
☆33Updated 9 months ago
ydyjya / LLM-IHS-Explanation
☆50Updated last year