CUDA implementation of autoregressive linear attention, with all the latest research findings
☆46May 23, 2023Updated 3 years ago
Alternatives and similar repositories for autoregressive-linear-attention-cuda
Users that are interested in autoregressive-linear-attention-cuda are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk☆47Jul 16, 2023Updated 2 years ago
- Explorations into the recently proposed Taylor Series Linear Attention☆100Aug 18, 2024Updated last year
- Implementation of GateLoop Transformer in Pytorch and Jax☆92Jun 18, 2024Updated last year
- Toy genetic algorithm in Pytorch☆56Apr 21, 2026Updated last month
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆122Oct 17, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Implementation of Zorro, Masked Multimodal Transformer, in Pytorch☆98Oct 20, 2023Updated 2 years ago
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆17Nov 11, 2024Updated last year
- Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch☆92Dec 22, 2023Updated 2 years ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Oct 22, 2023Updated 2 years ago
- Implementation of a simple BPE tokenizer, but in Nim☆22Jul 2, 2023Updated 2 years ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- Implementation of a multimodal diffusion transformer in Pytorch☆107Jun 24, 2024Updated last year
- A simple implementation of a deep linear Pytorch module☆21Oct 16, 2020Updated 5 years ago
- Multi-Modal Multi-Embodied Hivemind-like Iteration of RTX-2☆15Jun 27, 2025Updated 10 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- My own attempt at a long context genomics model, leveraging recent advances in long context attention modeling (Flash Attention + other h…☆54Jul 2, 2023Updated 2 years ago
- Standalone Product Key Memory module in Pytorch - for augmenting Transformer models☆87Nov 1, 2025Updated 6 months ago
- Implementation of the transformer proposed in "Building Blocks for a Complex-Valued Transformer Architecture"☆89Oct 13, 2023Updated 2 years ago
- Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in P…☆208Feb 14, 2024Updated 2 years ago
- Implementation of the Llama architecture with RLHF + Q-learning☆170Feb 1, 2025Updated last year
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆220Feb 13, 2023Updated 3 years ago
- A library for squeakily cleaning and filtering language datasets.☆50Jul 10, 2023Updated 2 years ago
- PyTorch reimplementation of the paper "HyperMixer: An MLP-based Green AI Alternative to Transformers" [arXiv 2022].☆18Mar 28, 2022Updated 4 years ago
- ☆24Jun 18, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch☆76Dec 4, 2022Updated 3 years ago
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆21May 12, 2026Updated last week
- Utilities for PyTorch distributed☆25Feb 27, 2025Updated last year
- RWKV-7 mini☆12Mar 29, 2025Updated last year
- Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…☆126Jul 26, 2024Updated last year
- Generate python ctypes classes from C headers. Requires LLVM clang☆15Aug 14, 2024Updated last year
- Exploration into the Firefly algorithm in Pytorch☆41Feb 14, 2025Updated last year
- A GPT, made only of MLPs, in Jax☆59Jun 23, 2021Updated 4 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- https://hf.co/hexgrad/Kokoro-82M☆14Jan 14, 2026Updated 4 months ago
- ☆13Jun 3, 2024Updated last year
- tenstorrent kernel from twitch☆28Mar 16, 2024Updated 2 years ago
- ☆21Mar 3, 2025Updated last year
- ☆34Sep 10, 2024Updated last year
- ☆15Nov 24, 2025Updated 6 months ago
- Implementation of Agent Attention in Pytorch☆93Jul 10, 2024Updated last year