ayyucekizrak / Mechanistic-InterpretabilityLinks
Mechanistic Interpretability in Transformers: This repository explores advanced techniques like Induction Head Detection and QK Circuit Analysis to uncover the inner workings of transformer-based models.
☆28Updated 11 months ago
Alternatives and similar repositories for Mechanistic-Interpretability
Users that are interested in Mechanistic-Interpretability are comparing it to the libraries listed below
Sorting:
- A curated list of Turkish AI models, datasets, papers☆45Updated this week
- This repository collects all relevant resources about interpretability in LLMs☆372Updated 10 months ago
- Sparsify transformers with SAEs and transcoders☆620Updated last week
- Sparse Autoencoder for Mechanistic Interpretability☆264Updated last year
- Dilbilim kurallarını temel alarak çok dilli metinleri işlemek ve anlam bütünlüğünü korumak için gelişmiş bir tokenizer altyapısı geliştir…☆15Updated last month
- ☆190Updated 9 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).☆219Updated 9 months ago
- ⏰ AI conference deadline countdowns☆280Updated last week
- ☆168Updated 10 months ago
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …☆211Updated last week
- ☆81Updated 6 months ago
- ☆345Updated 3 weeks ago
- ☆54Updated 10 months ago
- Using sparse coding to find distributed representations used by neural networks.☆269Updated last year
- ☆518Updated last year
- Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".☆269Updated 3 months ago
- Bu proje, yapay zeka (YZ) destekli bir içerik üretim stüdyosu olarak tasarlanmıştır. Kullanıcıların metin, görsel ve video gibi farklı iç…☆20Updated 3 months ago
- Training Sparse Autoencoders on Language Models☆958Updated last week
- official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”☆357Updated last week
- ☆121Updated 2 months ago
- ☆17Updated 5 months ago
- Turkish LM Tuner☆84Updated 10 months ago
- An extension of the nanoGPT repository for training small MOE models.☆187Updated 6 months ago
- ☆48Updated last month
- Open source replication of Anthropic's Crosscoders for Model Diffing☆59Updated 10 months ago
- Unofficial implementation of https://arxiv.org/pdf/2407.14679☆48Updated last year
- This repo contains Lyra AI's work in the E-Commerce Hackathon organized by Trendyol and Teknofest.☆13Updated 10 months ago
- ☆84Updated 5 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆124Updated 6 months ago
- Tools for optimizing steering vectors in LLMs.☆11Updated 5 months ago