Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆231Sep 6, 2024Updated last year
Alternatives and similar repositories for CoLT5-attention
Users that are interested in CoLT5-attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆379Jun 17, 2024Updated last year
- ☆11Oct 11, 2023Updated 2 years ago
- My attempts at applying Soundstream design on learned tokenization of text and then applying hierarchical attention to text generation☆90Oct 11, 2024Updated last year
- Implementation of Block Recurrent Transformer - Pytorch☆224Aug 20, 2024Updated last year
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆207Aug 26, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 3 years ago
- Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in P…☆207Feb 14, 2024Updated 2 years ago
- Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch☆655Dec 27, 2024Updated last year
- Exploring finetuning public checkpoints on filter 8K sequences on Pile☆115Mar 22, 2023Updated 3 years ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Oct 22, 2023Updated 2 years ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Jun 6, 2024Updated last year
- 🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch☆2,183Nov 27, 2024Updated last year
- Code repository for the c-BTM paper☆108Sep 26, 2023Updated 2 years ago
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆220Feb 13, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch☆879Oct 30, 2023Updated 2 years ago
- Utilities for PyTorch distributed☆25Feb 27, 2025Updated last year
- Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch☆2,617Jan 12, 2025Updated last year
- Implementation of a holodeck, written in Pytorch☆18Nov 1, 2023Updated 2 years ago
- Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate …☆641Jul 17, 2023Updated 2 years ago
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆563Dec 28, 2024Updated last year
- Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"☆389Jul 18, 2023Updated 2 years ago
- Implementation of GigaGAN, new SOTA GAN out of Adobe. Culmination of nearly a decade of research into GANs☆1,942Jan 12, 2025Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- This is a new metric that can be used to evaluate faithfulness of text generated by LLMs. The work behind this repository can be found he…☆31Aug 25, 2023Updated 2 years ago
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆804Jan 30, 2026Updated last month
- Implementation of SelfExtend from the paper "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning" from Pytorch and Zeta☆13Nov 11, 2024Updated last year
- Reverse Instructions to generate instruction tuning data with corpus examples☆214Mar 5, 2024Updated 2 years ago
- Recursive Bayesian Networks☆11May 11, 2025Updated 10 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆250Jun 6, 2025Updated 9 months ago
- Official code for the paper "Attention as a Hypernetwork"☆55Feb 24, 2026Updated last month
- ☆54May 20, 2024Updated last year
- Implementation of Flash Attention in Jax☆227Mar 1, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Understand and test language model architectures on synthetic tasks.☆263Updated this week
- Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways☆828Nov 9, 2022Updated 3 years ago
- Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"☆1,065Mar 7, 2024Updated 2 years ago
- All-in-one repository for Fine-tuning & Pretraining (Large) Language Models☆15Mar 8, 2023Updated 3 years ago
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆15Mar 11, 2024Updated 2 years ago
- ☆31Jul 2, 2023Updated 2 years ago
- TART: A plug-and-play Transformer module for task-agnostic reasoning☆202Jun 22, 2023Updated 2 years ago