amudide / switch_sae
Efficient Dictionary Learning with Switch Sparse Autoencoders (SAEs)
☆13Updated last month
Related projects ⓘ
Alternatives and complementary repositories for switch_sae
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆24Updated last week
- Code accompanying the paper "A Language Model's Guide Through Latent Space". It contains functionality for training and using concept vec…☆16Updated 8 months ago
- Lottery Ticket Adaptation☆36Updated last month
- Jax like function transformation engine but micro, microjax☆26Updated 2 weeks ago
- ☆36Updated 3 months ago
- DPO, but faster 🚀☆21Updated 2 weeks ago
- Training hybrid models for dummies.☆15Updated 2 weeks ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated 11 months ago
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Updated this week
- A place to store reusable transformer components of my own creation or found on the interwebs☆43Updated this week
- LLM training in simple, raw C/CUDA☆12Updated last month
- ☆28Updated last year
- Understanding how features learned by neural networks evolve throughout training☆31Updated 2 weeks ago
- Code for "Accelerating Training with Neuron Interaction and Nowcasting Networks"☆14Updated last week
- Using FlexAttention to compute attention with different masking patterns☆40Updated last month
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆36Updated 7 months ago
- Minimum Description Length probing for neural network representations☆16Updated last week
- We introduce EMMET and unify model editing with popular algorithms ROME and MEMIT.☆12Updated 2 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated 8 months ago
- Latent Large Language Models☆16Updated 2 months ago
- ☆40Updated this week
- Efficient Scaling laws and collaborative pretraining.☆13Updated 2 weeks ago
- ☆12Updated 3 weeks ago
- ☆31Updated 2 months ago
- ☆26Updated 4 months ago
- alternative way to calculating self attention☆18Updated 5 months ago
- Understanding the correlation between different LLM benchmarks☆29Updated 10 months ago
- ☆15Updated last month
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Updated last year
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆37Updated 5 months ago