A simple cross attention that updates both the source and target in one step
☆195Jul 29, 2025Updated 7 months ago
Alternatives and similar repositories for bidirectional-cross-attention
Users that are interested in bidirectional-cross-attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS 2024] Official implementation of paper "Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers"☆20Mar 10, 2025Updated last year
- Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch☆76Dec 4, 2022Updated 3 years ago
- An implementation of local windowed attention for language modeling☆498Jul 16, 2025Updated 8 months ago
- Implementation of Tranception, an attention network, paired with retrieval, that is SOTA for protein fitness prediction☆32Jun 19, 2022Updated 3 years ago
- Graph neural network message passing reframed as a Transformer with local attention☆70Dec 24, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆220Feb 13, 2023Updated 3 years ago
- Explorations into the recently proposed Taylor Series Linear Attention☆100Aug 18, 2024Updated last year
- My own attempt at a long context genomics model, leveraging recent advances in long context attention modeling (Flash Attention + other h…☆54Jul 2, 2023Updated 2 years ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Oct 22, 2023Updated 2 years ago
- Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"☆389Jul 18, 2023Updated 2 years ago
- Implementation of Lie Transformer, Equivariant Self-Attention, in Pytorch☆97Feb 19, 2021Updated 5 years ago
- Visual Domain Adaptation with Manifold Embedded Distribution Alignment (ACM MM'18)☆22Jan 23, 2019Updated 7 years ago
- A Pytorch Lightning WGAN-gp to generate faces☆11Jan 26, 2021Updated 5 years ago
- Revisiting Multimodal Emotion Recognition in Conversation from the Perspective of Graph Spectrum☆30Dec 15, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Pytorch implementation of Compressive Transformers, from Deepmind☆163Oct 4, 2021Updated 4 years ago
- Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena☆207Aug 26, 2023Updated 2 years ago
- Don't mix, match! Simple utilities for improved registration of Histopathology Whole Slide Images.☆11Oct 11, 2020Updated 5 years ago
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆59Oct 22, 2023Updated 2 years ago
- A simple implementation of a deep linear Pytorch module☆21Oct 16, 2020Updated 5 years ago
- Implementation of Discrete Key / Value Bottleneck, in Pytorch☆88Jul 9, 2023Updated 2 years ago
- Implementation of Uformer, Attention-based Unet, in Pytorch☆96Oct 26, 2021Updated 4 years ago
- [SpeechCom Journal] Learning and controlling the source-filter representation of speech with a variational autoencoder☆45Apr 18, 2023Updated 2 years ago
- A library of speech gadgets.☆14Oct 15, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Pytorch implementation of "Very Deep Graph Neural Networks via Noise Regularisation"☆10Aug 22, 2021Updated 4 years ago
- Implementation of Recurrent Interface Network (RIN), for highly efficient generation of images and video without cascading networks, in P…☆207Feb 14, 2024Updated 2 years ago
- Implementation of Nyström Self-attention, from the paper Nyströmformer☆145Mar 24, 2025Updated last year
- Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning☆166Feb 12, 2024Updated 2 years ago
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"☆15Apr 30, 2025Updated 10 months ago
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆804Jan 30, 2026Updated last month
- Pytorch implementation of a simple way to enable (Stochastic) Frame Averaging for any network☆51Jul 26, 2024Updated last year
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆16Nov 11, 2024Updated last year
- ☆17Mar 31, 2020Updated 5 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆17Jun 20, 2022Updated 3 years ago
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆102Feb 25, 2023Updated 3 years ago
- Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch☆231Sep 6, 2024Updated last year
- Implementation of the convolutional module from the Conformer paper, for use in Transformers☆433May 17, 2023Updated 2 years ago
- Axial Positional Embedding for Pytorch☆84Feb 25, 2025Updated last year
- Utilities for PyTorch distributed☆25Feb 27, 2025Updated last year
- ☆14Mar 29, 2022Updated 3 years ago