gmongaras / Cottention_TransformerLinks
Code for the paper "Cottention: Linear Transformers With Cosine Attention"
☆20Updated 2 months ago
Alternatives and similar repositories for Cottention_Transformer
Users that are interested in Cottention_Transformer are comparing it to the libraries listed below
Sorting:
- HGRN2: Gated Linear RNNs with State Expansion☆56Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆47Updated last year
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Updated 2 years ago
- Here we will test various linear attention designs.☆62Updated last year
- Awesome Triton Resources☆39Updated 9 months ago
- User-friendly implementation of the Mixture-of-Sparse-Attention (MoSA). MoSA selects distinct tokens for each head with expert choice rou…☆28Updated 9 months ago
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆54Updated 3 weeks ago
- ☆91Updated last year
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆59Updated 10 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Updated last year
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025