Unofficial Implementation of Selective Attention Transformer
☆20Oct 31, 2024Updated last year
Alternatives and similar repositories for selective-attention-transformer
Users that are interested in selective-attention-transformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆11Dec 30, 2024Updated last year
- ☆19Jul 31, 2025Updated 9 months ago
- A Qwen .5B reasoning model trained on OpenR1-Math-220k☆14Oct 11, 2025Updated 6 months ago
- Visualize any repo or codebase into diagram or animation☆23Oct 14, 2024Updated last year
- ☆18Jun 3, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [NeurIPS 2024 Spotlight] code for "Diffusion Model with Cross Attention as an Inductive Bias for Disentanglement"☆20Jan 26, 2025Updated last year
- A no-string API framework for deploying schema-based reasoning into third-party apps☆23Mar 6, 2026Updated 2 months ago
- pytorch☆10Apr 13, 2022Updated 4 years ago
- Learning to Skip the Middle Layers of Transformers☆17Aug 7, 2025Updated 9 months ago
- ☆27Jun 29, 2025Updated 10 months ago
- ☆10Aug 26, 2022Updated 3 years ago
- Jupyter notebooks from our weekly (or so) hackathons☆11Dec 3, 2024Updated last year
- Mixture of Lora Experts