hbseong97 / HarmAugLinks
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
☆12Updated 4 months ago
Alternatives and similar repositories for HarmAug
Users that are interested in HarmAug are comparing it to the libraries listed below
Sorting:
- ☆20Updated 2 months ago
- Official repo of dataset-decomposition paper [NeurIPS 2024]☆19Updated 6 months ago
- Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"☆14Updated 7 months ago
- ☆11Updated 9 months ago
- Confidence Regulation Neurons in Language Models (NeurIPS 2024)☆10Updated 5 months ago
- The Official Code Repo for EgoOrientBench [CVPR25]☆11Updated 2 months ago
- ☆23Updated last year
- About Official PyTorch implementation of "Query-Efficient Black-Box Red Teaming via Bayesian Optimization" (ACL'23)☆15Updated 2 years ago
- The repo for paper: Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models.☆10Updated 6 months ago
- Official implementation of Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs (ICLR 2024).☆42Updated 11 months ago
- ☆10Updated last week
- ☆12Updated 3 months ago
- ☆19Updated 3 months ago
- ☆12Updated 5 months ago
- Lightweight Hybrid Search and Reranking☆10Updated 3 months ago
- Implementation of PatchSAE as presented in "Sparse autoencoders reveal selective remapping of visual concepts during adaptation"☆18Updated 2 months ago
- 🤫 Code and benchmark for our ICLR 2024 spotlight paper: "Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Con…☆42Updated last year
- All-in-one repository for Fine-tuning & Pretraining (Large) Language Models☆15Updated 2 years ago
- ☆18Updated 5 months ago
- Common tools for data processing☆16Updated 3 months ago
- Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"☆12Updated 3 months ago
- Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Sp…☆20Updated last year
- [𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…☆51Updated last year
- [ACL 2023] Gradient Ascent Post-training Enhances Language Model Generalization☆29Updated 10 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆43Updated last year
- This is an official implementation of the paper ``Building Math Agents with Multi-Turn Iterative Preference Learning'' with multi-turn DP…☆27Updated 7 months ago
- User-friendly viewer for Parquet files☆9Updated 8 months ago
- [ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers☆68Updated 3 weeks ago
- Official PyTorch implementation of "Neural Relation Graph: A Unified Framework for Identifying Label Noise and Outlier Data" (NeurIPS'23)☆15Updated last year
- LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations☆18Updated last month