A repository for log-time feedforward networks
☆224Apr 9, 2024Updated last year
Alternatives and similar repositories for fastfeedforward
Users that are interested in fastfeedforward are comparing it to the libraries listed below
Sorting:
- The repository for the code of the UltraFastBERT paper☆519Mar 24, 2024Updated last year
- FastFeedForward Networks☆20Dec 8, 2023Updated 2 years ago
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆16Nov 11, 2024Updated last year
- ☆13Aug 23, 2024Updated last year
- Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…☆25Oct 13, 2025Updated 4 months ago
- Brainwave is a state-of-the-art neural decoder that transforms electroencephalogram (EEG) and brain signals into multimodal outputs inclu…☆14Oct 6, 2025Updated 4 months ago
- Yet another LLM☆10Apr 6, 2023Updated 2 years ago
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆226Sep 18, 2025Updated 5 months ago
- Fast approximate inference on a single GPU with sparsity aware offloading☆39Jan 4, 2024Updated 2 years ago
- some common Huggingface transformers in maximal update parametrization (µP)☆87Mar 14, 2022Updated 3 years ago
- Tiktok is an advanced multimedia recommender system that fuses the generative modality-aware collaborative self-augmentation and contrast…☆14Aug 18, 2023Updated 2 years ago
- An implementation of the paper Brain2Qwerty that translates brain EEG data into text for reading people's brains. There was no code so we…☆22Feb 9, 2025Updated last year
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆58Feb 9, 2026Updated 3 weeks ago
- Beyond Language Models: Byte Models are Digital World Simulators☆335Jun 6, 2024Updated last year
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆397Feb 24, 2024Updated 2 years ago
- Implementation of "Audio xLSTMs: Learning Self-supervised audio representations with xLSTMs" in PyTorch☆19Feb 9, 2026Updated 3 weeks ago
- Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates☆473Apr 21, 2024Updated last year
- Convolutions for Sequence Modeling☆913Jun 13, 2024Updated last year
- Code repository for the paper "MrT5: Dynamic Token Merging for Efficient Byte-level Language Models."☆54Sep 25, 2025Updated 5 months ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- ☆35Apr 12, 2024Updated last year
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆562Dec 28, 2024Updated last year
- ☆50Mar 14, 2024Updated last year
- Extend existing LLMs way beyond the original training length with constant memory usage, without retraining☆736Apr 10, 2024Updated last year
- ☆83Apr 16, 2024Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆280Nov 3, 2023Updated 2 years ago
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆343Dec 28, 2024Updated last year
- Minimalistic large language model 3D-parallelism training☆2,579Feb 19, 2026Updated last week
- A tool for generic tracking-based CV annotation☆18Jan 27, 2021Updated 5 years ago
- Cramming the training of a (BERT-type) language model into limited compute.☆1,363Jun 13, 2024Updated last year
- 🌏 Modular retrievers for zero-shot multilingual IR.☆30Mar 6, 2024Updated last year
- ☆316Jun 21, 2024Updated last year
- The PyTorch implementation of paper "KERMIT: Knowledge Graph Completion of Enhanced Relation Modeling with Inverse Transformation"☆15Jul 4, 2025Updated 8 months ago
- Simple-to-use scoring function for arbitrarily tokenized texts.☆47Feb 19, 2025Updated last year
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆67Apr 24, 2024Updated last year
- Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No Context Behind: Efficient Infinite Context Transformers with I…☆375Apr 23, 2024Updated last year
- Unleash the full potential of exascale LLMs on consumer-class GPUs, proven by extensive benchmarks, with no long-term adjustments and min…☆26Nov 11, 2024Updated last year
- YaRN: Efficient Context Window Extension of Large Language Models☆1,673Apr 17, 2024Updated last year
- maximal update parametrization (µP)☆1,686Jul 17, 2024Updated last year