LUMIA-Group / FourierTransformer
The official Pytorch implementation of the paper "Fourier Transformer: Fast Long Range Modeling by Removing Sequence Redundancy with FFT Operator" (ACL 2023 Findings)
☆28Updated 6 months ago
Related projects: ⓘ
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆38Updated last year
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆48Updated last week
- A repository for DenseSSMs☆86Updated 5 months ago
- Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States☆33Updated 2 months ago
- State Space Models☆55Updated 4 months ago
- Official Code for ICLR 2024 Paper: Non-negative Contrastive Learning☆36Updated 5 months ago
- My implementation of the original transformer model (Vaswani et al.). I've additionally included the playground.py file for visualizing o…☆36Updated 9 months ago
- Source code of EMNLP 2022 Findings paper "SparseAdapter: An Easy Approach for Improving the Parameter-Efficiency of Adapters"☆18Updated 5 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆46Updated last month
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆42Updated last year
- ☆16Updated last year
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆62Updated 11 months ago
- A curated list of Model Merging methods.☆71Updated this week
- A Triton Kernel for incorporating Bi-Directionality in Mamba2☆43Updated last week
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆94Updated last month
- CatMAE☆12Updated 9 months ago
- Awesome Learn From Model Beyond Fine-Tuning: A Survey☆44Updated 9 months ago
- ☆41Updated 5 months ago
- Open source community's implementation of the model from "LANGUAGE MODEL BEATS DIFFUSION — TOKENIZER IS KEY TO VISUAL GENERATION"☆15Updated last week
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆60Updated 4 months ago
- [ACL 2023] Code for paper “Tailoring Instructions to Student’s Learning Levels Boosts Knowledge Distillation”(https://arxiv.org/abs/2305.…☆38Updated last year
- ☆20Updated last year
- ☆54Updated 2 months ago
- The source code of the EMNLP 2023 main conference paper: Sparse Low-rank Adaptation of Pre-trained Language Models.☆62Updated 6 months ago
- [NeurIPS2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆28Updated last year
- [ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)☆111Updated 6 months ago
- ☆25Updated 11 months ago
- MambaFormer in-context learning experiments and implementation for https://arxiv.org/abs/2402.04248☆33Updated 3 months ago
- Curse-of-memory phenomenon of RNNs in sequence modelling☆17Updated this week
- Official code for the paper "Attention as a Hypernetwork"☆20Updated 2 months ago