☆58Feb 12, 2024Updated 2 years ago
Alternatives and similar repositories for insider-trading
Users that are interested in insider-trading are comparing it to the libraries listed below
Sorting:
- ☆20May 25, 2024Updated last year
- A TinyStories LM with SAEs and transcoders☆14Apr 3, 2025Updated 11 months ago
- Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.☆25Jan 26, 2024Updated 2 years ago
- ☆21Jun 22, 2025Updated 8 months ago
- A tiny easily hackable implementation of a feature dashboard.☆15Oct 21, 2025Updated 4 months ago
- ☆20Nov 15, 2024Updated last year
- Open Source Replication of Anthropic's Alignment Faking Paper☆54Apr 4, 2025Updated 10 months ago
- Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique☆18Aug 22, 2024Updated last year
- Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"☆19Dec 14, 2024Updated last year
- Implementation of Influence Function approximations for differently sized ML models, using PyTorch☆16Sep 15, 2023Updated 2 years ago
- ☆17Updated this week
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆131Mar 9, 2024Updated last year
- Exploring Model Kinship for Merging Large Language Models☆27Apr 16, 2025Updated 10 months ago
- gpt completions in vscode☆35Mar 24, 2023Updated 2 years ago
- ☆25Nov 11, 2025Updated 3 months ago
- ☆265Jan 12, 2026Updated last month
- ☆22Sep 9, 2021Updated 4 years ago
- Ember is a hosted API/SDK that lets you shape AI model behavior by directly controlling a model's internal units of computation, or "feat…☆43Jul 14, 2025Updated 7 months ago
- ☆25Sep 5, 2024Updated last year
- A collection of different ways to implement accessing and modifying internal model activations for LLMs☆20Oct 18, 2024Updated last year
- Sparse Autoencoder Training Library☆55May 1, 2025Updated 10 months ago
- Thorn in a HaizeStack test for evaluating long-context adversarial robustness.☆26Aug 3, 2024Updated last year
- Code to enable layer-level steering in LLMs using sparse auto encoders☆31Sep 18, 2025Updated 5 months ago
- ☆41Jul 6, 2025Updated 7 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Feb 23, 2024Updated 2 years ago
- ☆27Oct 6, 2024Updated last year
- Open source replication of Anthropic's Crosscoders for Model Diffing☆64Oct 27, 2024Updated last year
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆28May 23, 2024Updated last year
- This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.☆34Oct 28, 2025Updated 4 months ago
- [EMNLP 2024] "Revisiting Who's Harry Potter: Towards Targeted Unlearning from a Causal Intervention Perspective"☆32Jul 22, 2024Updated last year
- ☆28May 4, 2023Updated 2 years ago
- ☆89Dec 18, 2025Updated 2 months ago
- Accompanying code for "Boosted Prompt Ensembles for Large Language Models"☆30Apr 13, 2023Updated 2 years ago
- ☆13Oct 5, 2025Updated 4 months ago
- Plan✕ is a platform for creating and publishing digital planning services☆17Updated this week
- Sparse Autoencoder for Mechanistic Interpretability☆292Jul 20, 2024Updated last year
- Situational Awareness Dataset☆46Dec 14, 2024Updated last year
- Shaping capabilities with token-level pretraining data filtering☆83Jan 28, 2026Updated last month
- A virtual caregiver system that extracts the expression of mental and physical health states through dialogue-based human-computer intera…☆14Jan 29, 2023Updated 3 years ago