yellowtownhz / sycophancy-interpretabilityView external linksLinks
☆14Feb 5, 2025Updated last year
Alternatives and similar repositories for sycophancy-interpretability
Users that are interested in sycophancy-interpretability are comparing it to the libraries listed below
Sorting:
- CUPCase: Clinically Uncommon Patient Cases and Diagnoses Dataset☆14Oct 12, 2025Updated 4 months ago
- 🌿快速生成文件夹目录结构,支持定义目录层级,支持生成到 markdown 文件。☆13Oct 19, 2022Updated 3 years ago
- Official code for PLoP☆17Jun 30, 2025Updated 7 months ago
- Read emotion with a line of code 🎭☆17Jan 2, 2025Updated last year
- ☆45Jan 4, 2022Updated 4 years ago
- [ACL 2025] LongSafety: Evaluating Long-Context Safety of Large Language Models☆15Jun 18, 2025Updated 7 months ago
- Official implementation of Visco-Attack (EMNLP 2025 Main). We will progressively release the code and one-click reproduction scripts.☆28Aug 22, 2025Updated 5 months ago
- ☆14May 7, 2022Updated 3 years ago
- A CNN feature based image retrieval website☆15May 16, 2017Updated 8 years ago
- linux && windows compatible caffe☆13Dec 4, 2019Updated 6 years ago
- NAEP Math Assessment Item Score Prediction Challenge (Spring 2023)☆15Jun 8, 2023Updated 2 years ago
- code space of paper "Safety Layers in Aligned Large Language Models: The Key to LLM Security" (ICLR 2025)☆21Apr 26, 2025Updated 9 months ago
- ☆22Mar 21, 2025Updated 10 months ago
- Asoul女团的桌宠!本作品已获得字节跳动x稀土掘金 2022编程挑战赛 第二名以及最佳人气奖☆16Jun 19, 2022Updated 3 years ago
- ☆13Feb 3, 2021Updated 5 years ago
- Official implementation repository for the paper Towards General Conceptual Model Editing via Adversarial Representation Engineering.☆18Dec 6, 2024Updated last year
- Our research proposes a novel MoGU framework that improves LLMs' safety while preserving their usability.☆18Jan 14, 2025Updated last year
- Code for safety test in "Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates"☆22Sep 21, 2025Updated 4 months ago
- Pushing CIFAR-10 SOTA using ResNets.☆16Oct 17, 2025Updated 3 months ago
- An implementation of SEAL: Safety-Enhanced Aligned LLM fine-tuning via bilevel data selection.☆22Feb 20, 2025Updated 11 months ago
- Official reposity for paper "High-Dimension Human Value Representation in Large Language Models" (NAACL'25 Main)☆23Jul 9, 2024Updated last year
- ☆26Jan 23, 2024Updated 2 years ago
- console.log for your stdio MCP server☆23Apr 1, 2025Updated 10 months ago
- This repository contains all the notes I took in the learning process of all the technologies during my study! 这个仓库记录了我在本科期间学习各类技术的过程中记录…☆21Mar 14, 2023Updated 2 years ago
- tutorials☆22Aug 12, 2022Updated 3 years ago
- Code for "Adversarial Defense by Stratified Convolutional Sparse Coding"☆19Jul 27, 2019Updated 6 years ago
- ☆26Feb 7, 2023Updated 3 years ago
- ☆23Oct 27, 2023Updated 2 years ago
- This is the official code for the paper "Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning" (NeurIPS2024)☆25Sep 10, 2024Updated last year
- An analysis of which factors best predict the spread of forest fires using data from Portugal and California.☆160Aug 3, 2025Updated 6 months ago
- Tutorials that take an in depth look at how to view and manipulate DICOM images and how to get them ready for machine learning☆26Apr 12, 2023Updated 2 years ago
- ☆30Apr 26, 2025Updated 9 months ago
- ☆32Feb 11, 2025Updated last year
- Multi-Layer Sparse Autoencoders (ICLR 2025)☆29Feb 6, 2026Updated last week
- Model Selection with Large Language Models for Reasoning (EMNLP2023 Findings)☆30Dec 23, 2023Updated 2 years ago
- Code and Data for WWW'23 paper Reinforcement Learning-based Counter-Misinformation Response Generation: A Case Study of COVID-19 Vaccine …☆27Jun 28, 2023Updated 2 years ago
- ☆27Sep 22, 2021Updated 4 years ago
- Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks☆32Jul 9, 2024Updated last year
- alibabacloud-quantization-networks☆122Nov 8, 2019Updated 6 years ago