ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆13Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for crosscoder-model-diff-replication
- ☆18Updated this week
- Universal Neurons in GPT2 Language Models☆26Updated 5 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆46Updated 2 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆60Updated last month
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆25Updated 5 months ago
- ☆49Updated 6 months ago
- ☆43Updated 4 months ago
- The repository contains code for Adaptive Data Optimization☆18Updated 3 weeks ago
- Official implementation of "BERTs are Generative In-Context Learners"☆19Updated 4 months ago
- Repository for "I am a Strange Dataset: Metalinguistic Tests for Language Models"☆39Updated 10 months ago
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆41Updated 9 months ago
- Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"☆18Updated 2 months ago
- ☆50Updated last week
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆61Updated 4 months ago
- ☆16Updated 10 months ago
- ☆40Updated this week
- Understanding how features learned by neural networks evolve throughout training☆31Updated 2 weeks ago
- ☆24Updated 4 months ago
- ☆18Updated 3 weeks ago
- ☆19Updated 3 months ago
- A library for efficient patching and automatic circuit discovery.☆30Updated last month
- ☆50Updated 5 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆84Updated 3 months ago
- ☆61Updated 2 months ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆25Updated last year
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆78Updated 8 months ago
- ☆105Updated this week
- ReBase: Training Task Experts through Retrieval Based Distillation☆27Updated 3 months ago
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆34Updated 7 months ago
- ☆17Updated last month