gmongaras / Cottention_Transformer
Code for the paper "Cottention: Linear Transformers With Cosine Attention"
☆13Updated 3 months ago
Alternatives and similar repositories for Cottention_Transformer:
Users that are interested in Cottention_Transformer are comparing it to the libraries listed below
- Official implementation of ECCV24 paper: POA☆24Updated 5 months ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Updated last year
- ☆15Updated last week
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆25Updated 9 months ago
- The official repo of continuous speculative decoding☆21Updated 2 months ago
- Official code for the paper "Attention as a Hypernetwork"☆23Updated 6 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆52Updated 5 months ago
- Triton implement of bi-directional (non-causal) linear attention☆35Updated this week
- Here we will test various linear attention designs.☆58Updated 8 months ago
- Official Code Repository for the paper "Key-value memory in the brain"☆20Updated last week
- Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxiang Li, Lu Yi…☆16Updated 3 weeks ago
- Using FlexAttention to compute attention with different masking patterns☆40Updated 3 months ago
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆35Updated last year
- Official code implementation for the work Preference Alignment with Flow Matching (NeurIPS 2024)☆20Updated 2 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆33Updated 3 months ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆112Updated 3 months ago
- Explorations into improving ViTArc with Slot Attention☆37Updated 3 months ago
- Awesome Triton Resources☆19Updated last month
- ☆37Updated 9 months ago
- A repository for research on medium sized language models.☆76Updated 7 months ago
- This repository contains papers for a comprehensive survey on accelerated generation techniques in Large Language Models (LLMs).☆12Updated 7 months ago
- Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk☆46Updated last year
- ☆32Updated last year
- Official PyTorch Implementation for Task Vectors are Cross-Modal☆21Updated last month
- ☆24Updated 3 months ago
- ☆12Updated this week
- ☆25Updated last year
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆21Updated 9 months ago
- ☆30Updated 8 months ago
- ☆29Updated 10 months ago