zhrli324 / RLEditLinks
☆12Updated 3 months ago
Alternatives and similar repositories for RLEdit
Users that are interested in RLEdit are comparing it to the libraries listed below
Sorting:
- [ACL 2024] Code and data for "Machine Unlearning of Pre-trained Large Language Models"☆57Updated 8 months ago
- ☆38Updated 3 months ago
- Code for Reducing Hallucinations in Vision-Language Models via Latent Space Steering☆57Updated 6 months ago
- An implementation of SEAL: Safety-Enhanced Aligned LLM fine-tuning via bilevel data selection.☆16Updated 3 months ago
- [ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"☆59Updated 5 months ago
- Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective☆32Updated 4 months ago
- ☆46Updated 6 months ago
- ☆58Updated 10 months ago
- Code for ICLR 2025 Paper "GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment"☆10Updated 3 months ago
- A curated list of resources for activation engineering☆85Updated last week
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆126Updated last month
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆79Updated 2 months ago
- [ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"☆21Updated 2 weeks ago
- ☆22Updated 2 months ago
- ☆40Updated 3 months ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆59Updated last year
- ☆33Updated 8 months ago
- Official code for SEAL: Steerable Reasoning Calibration of Large Language Models for Free☆25Updated 2 months ago
- Awesome Large Reasoning Model(LRM) Safety.This repository is used to collect security-related research on large reasoning models such as …☆64Updated this week
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆82Updated 5 months ago
- ☆28Updated 11 months ago
- This repo contains the code for the paper "Understanding and Mitigating Hallucinations in Large Vision-Language Models via Modular Attrib…☆17Updated 2 months ago
- [NAACL 2025 Main] Official Implementation of MLLMU-Bench☆26Updated 2 months ago
- ☆26Updated last week
- [ECCV 2024] Official PyTorch Implementation of "How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs"☆81Updated last year
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆107Updated this week
- ☆33Updated last week
- ☆49Updated 11 months ago
- A versatile toolkit for applying Logit Lens to modern large language models (LLMs). Currently supports Llama-3.1-8B and Qwen-2.5-7B, enab…☆85Updated 3 months ago
- Official Pytorch implementation of "Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations" (ICLR '25)☆73Updated last week