tonychenxyz / selfieLinks
This repository contains the code and data for the paper "SelfIE: Self-Interpretation of Large Language Model Embeddings" by Haozhe Chen, Carl Vondrick, and Chengzhi Mao.
☆50Updated 6 months ago
Alternatives and similar repositories for selfie
Users that are interested in selfie are comparing it to the libraries listed below
Sorting:
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆59Updated last year
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆112Updated last year
- [ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning☆94Updated last year
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆37Updated last week
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆124Updated 3 months ago
- ☆44Updated 3 months ago
- [ICLR 2025] Code and Data Repo for Paper "Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation"☆64Updated 6 months ago
- [ACL 2025] Knowledge Unlearning for Large Language Models☆37Updated last month
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆121Updated 9 months ago
- ☆30Updated last year
- Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective☆32Updated 4 months ago
- [ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style☆48Updated last month
- code for EMNLP 2024 paper: Neuron-Level Knowledge Attribution in Large Language Models☆35Updated 7 months ago
- Official Code Repository for LM-Steer Paper: "Word Embeddings Are Steers for Language Models" (ACL 2024 Outstanding Paper Award)☆114Updated 8 months ago
- ☆37Updated last year
- A Sober Look at Language Model Reasoning☆74Updated last week
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆88Updated 8 months ago
- Interpretable Contrastive Monte Carlo Tree Search Reasoning☆48Updated 7 months ago
- [ICLR 2025 Workshop] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"☆25Updated last week
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆44Updated 2 months ago
- Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting☆32Updated last year
- ☆46Updated 7 months ago
- A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity.☆72Updated 3 months ago
- [ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)☆79Updated 8 months ago
- FeatureAlignment = Alignment + Mechanistic Interpretability☆28Updated 3 months ago
- ☆35Updated 3 months ago
- ☆40Updated last year
- [ICML 2024] One Prompt is Not Enough: Automated Construction of a Mixture-of-Expert Prompts - TurningPoint AI☆26Updated 9 months ago
- [ICML 2024] Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications☆79Updated 2 months ago
- ☆41Updated 8 months ago