TsinghuaC3I / Fourier-Position-Embedding
Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization
☆13Updated last month
Alternatives and similar repositories for Fourier-Position-Embedding:
Users that are interested in Fourier-Position-Embedding are comparing it to the libraries listed below
- Here we will test various linear attention designs.☆58Updated 9 months ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Updated last year
- This repo is based on https://github.com/jiaweizzhao/GaLore☆24Updated 5 months ago
- Official code for the paper "Attention as a Hypernetwork"☆24Updated 7 months ago
- Official implementation of ECCV24 paper: POA☆24Updated 6 months ago
- ☆12Updated last month
- ☆71Updated 6 months ago
- A repository for research on medium sized language models.☆76Updated 8 months ago
- Implementation of the proposed MaskBit from Bytedance AI☆75Updated 3 months ago
- ☆21Updated 8 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆52Updated 6 months ago
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆19Updated 3 weeks ago
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆31Updated 6 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆27Updated 10 months ago
- Minimal Implementation of Visual Autoregressive Modelling (VAR)☆26Updated last month
- Triton implement of bi-directional (non-causal) linear attention☆41Updated 2 weeks ago
- Unofficial Implementation of Selective Attention Transformer☆15Updated 3 months ago
- Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)☆53Updated 10 months ago
- Official implementation of the ICML 2024 paper RoSA (Robust Adaptation)☆38Updated last year
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆73Updated this week
- Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆57Updated 2 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆96Updated 4 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆42Updated 7 months ago
- Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"☆17Updated this week
- NeuMeta transforms neural networks by allowing a single model to adapt on the fly to different sizes, generating the right weights when n…☆39Updated 3 months ago
- A general framework for inference-time scaling and steering of diffusion models with arbitrary rewards.☆87Updated this week
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆118Updated 5 months ago
- Explorations into improving ViTArc with Slot Attention☆37Updated 4 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆35Updated 4 months ago