Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount of time on any token
☆53Oct 22, 2023Updated 2 years ago
Alternatives and similar repositories for pause-transformer
Users that are interested in pause-transformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of a holodeck, written in Pytorch☆18Nov 1, 2023Updated 2 years ago
- Explorations into the recently proposed Taylor Series Linear Attention☆100Aug 18, 2024Updated last year
- Debate interface, experiments, etc.☆10Mar 12, 2024Updated 2 years ago
- ☆14Jan 16, 2024Updated 2 years ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆67Apr 24, 2024Updated last year
- ☆24Jan 27, 2026Updated last month
- ☆32May 30, 2024Updated last year
- ☆19Feb 6, 2026Updated last month
- My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation☆90Oct 11, 2024Updated last year
- Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSE☆19Sep 22, 2021Updated 4 years ago
- Implementation of VisionLLaMA from the paper: "VisionLLaMA: A Unified LLaMA Interface for Vision Tasks" in PyTorch and Zeta☆16Nov 11, 2024Updated last year
- Implementation of the algorithm detailed in paper "Evolutionary design of molecules based on deep learning and a genetic algorithm"☆24Dec 15, 2023Updated 2 years ago
- A place to store reusable transformer components of my own creation or found on the interwebs☆75Updated this week
- This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…☆39Mar 2, 2023Updated 3 years ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆20Jan 19, 2025Updated last year
- ☆44Nov 17, 2024Updated last year
- ☆48Jan 3, 2026Updated 2 months ago
- Code reproducing the results in the article Aguilera, M, Morales, PA, Rosas, FE, M & Shimazaki H (2025)☆36Feb 19, 2026Updated last month
- An implementation of (Induced) Set Attention Block, from the Set Transformers paper☆67Jan 10, 2023Updated 3 years ago
- Stick-breaking attention☆63Jul 1, 2025Updated 8 months ago
- ☆52Feb 5, 2025Updated last year
- Course Website for "AI618: Generative Model and Unsupervised Learning"☆37May 23, 2023Updated 2 years ago
- This is the official PyTorch implementation for the HLGP algorithm used to solve large-scale CVRP.☆10Feb 13, 2025Updated last year
- This is an official implementation of GRIT-VLP☆20Aug 8, 2022Updated 3 years ago
- Implementation of Discrete Key / Value Bottleneck, in Pytorch☆88Jul 9, 2023Updated 2 years ago
- Ready to run PyTorch implementation of Data2Vec 2.0: Highly efficient self-supervised representation learning for vision, speech and text…☆16Mar 29, 2023Updated 2 years ago
- Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways☆828Nov 9, 2022Updated 3 years ago
- [WIP] Transformer to embed Danbooru labelsets☆13Mar 31, 2024Updated last year
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆56Oct 27, 2025Updated 4 months ago
- ☆27Feb 12, 2026Updated last month
- interactive shader with tensorflowjs facemesh☆15Dec 7, 2022Updated 3 years ago
- Implementation of Flash Attention in Jax☆227Mar 1, 2024Updated 2 years ago
- Official Implementation of NeurIPS'23 Paper "Cross-Episodic Curriculum for Transformer Agents"☆31Oct 12, 2023Updated 2 years ago
- ☆13Dec 12, 2023Updated 2 years ago
- ☆15Mar 15, 2022Updated 4 years ago
- Multi-step reasoning MLLM☆16Mar 8, 2026Updated 2 weeks ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆27Mar 13, 2026Updated last week
- PITS: Variational Pitch Inference for End-to-end Pitch-controllable TTS without External Pitch Predictor☆17Apr 13, 2023Updated 2 years ago
- [ICML2025] Official Repo for Paper "Optimizing Temperature for Language Models with Multi-Sample Inference"☆22Feb 16, 2025Updated last year