tonychenxyz / selfieView external linksLinks
This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen, Carl Vondrick, and Chengzhi Mao.
☆55Dec 9, 2024Updated last year
Alternatives and similar repositories for selfie
Users that are interested in selfie are comparing it to the libraries listed below
Sorting:
- ☆25Dec 20, 2023Updated 2 years ago
- ✒️ A gallery of experiments with Scalable Vector Graphics (SVG) and interactive visualizations.☆13Jan 6, 2023Updated 3 years ago
- ☆28Nov 16, 2025Updated 3 months ago
- Data and models for the paper "Configurable Safety Tuning of Language Models with Synthetic Preference Data"☆16Jul 27, 2024Updated last year
- Train text generation model with JavaScript.☆15Jul 14, 2024Updated last year
- Sparse Autoencoder Training Library☆56May 1, 2025Updated 9 months ago
- ☆12Oct 23, 2022Updated 3 years ago
- ☆146Dec 30, 2025Updated last month
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ou…☆32Apr 20, 2024Updated last year
- This is the oficial repository for "Safer-Instruct: Aligning Language Models with Automated Preference Data"☆17Feb 22, 2024Updated last year
- ☆22Updated this week
- DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails☆30Feb 26, 2025Updated 11 months ago
- ☆17Feb 14, 2024Updated 2 years ago
- ☆33Jul 9, 2025Updated 7 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆243Dec 16, 2024Updated last year
- TACL 2025: Investigating Adversarial Trigger Transfer in Large Language Models☆19Aug 17, 2025Updated 5 months ago
- Training Transformers with knowledge localization (SGTM)☆48Jan 11, 2026Updated last month
- All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks☆18Apr 24, 2024Updated last year
- ☆17Aug 1, 2025Updated 6 months ago
- Social Network Analysis and STEM Education is designed to prepare researchers to apply network analysis in order to better understand and…☆14Jul 14, 2025Updated 7 months ago
- Official Implementation of NeurIPS'23 Paper "Cross-Episodic Curriculum for Transformer Agents"☆31Oct 12, 2023Updated 2 years ago
- ☆30Aug 2, 2024Updated last year
- Code repo for the model organisms and convergent directions of EM papers.☆49Sep 22, 2025Updated 4 months ago
- Improving Steering Vectors by Targeting Sparse Autoencoder Features☆27Nov 20, 2024Updated last year
- This is the official repository for the "Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP" paper acce…☆25Apr 18, 2024Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆69May 31, 2024Updated last year
- ☆27Oct 22, 2024Updated last year
- Package to optimize Adversarial Attacks against (Large) Language Models with Varied Objectives☆70Feb 22, 2024Updated last year
- Algebraic value editing in pretrained language models☆68Nov 1, 2023Updated 2 years ago
- ☆207Oct 14, 2025Updated 4 months ago
- Official Implementation for "In-Context Reinforcement Learning from Noise Distillation"☆34Sep 18, 2024Updated last year
- Using sparse coding to find distributed representations used by neural networks.☆296Nov 10, 2023Updated 2 years ago
- ☆17Sep 1, 2024Updated last year
- Writing Observer and Learning Observer: A system for monitoring learning process data, with an initial focus on writing process data from…☆12Feb 9, 2026Updated last week
- Plugin QGIS☆10Jan 16, 2023Updated 3 years ago
- Auditing agents for fine-tuning safety☆18Oct 21, 2025Updated 3 months ago
- This module is a tool for calculating correlations such as Partial, Tetrachoric, Intraclass correlation coefficients, Bootstrap agreement…☆11Jan 31, 2026Updated 2 weeks ago
- the small distributed language model toolkit; fine-tune state-of-the-art LLMs anywhere, rapidly☆32Oct 19, 2024Updated last year