eth-easl / fmengine
Utilities for Training Very Large Models
☆56Updated last week
Related projects: ⓘ
- ☆66Updated 3 months ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆66Updated 3 months ago
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆34Updated 9 months ago
- My explorations into editing the knowledge and memories of an attention network☆34Updated last year
- Simple and efficient pytorch-native transformer training and inference (batched)☆53Updated 5 months ago
- Using FlexAttention to compute attention with different masking patterns☆28Updated last week
- ☆68Updated 2 months ago
- ☆42Updated this week
- ☆69Updated 4 months ago
- ☆45Updated 7 months ago
- ☆29Updated last year
- Collection of autoregressive model implementation☆62Updated 2 weeks ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆48Updated last week
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…☆14Updated 11 months ago
- Triton Implementation of HyperAttention Algorithm☆46Updated 9 months ago
- Here we will test various linear attention designs.☆55Updated 4 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆34Updated 10 months ago
- ☆50Updated last month
- A place to store reusable transformer components of my own creation or found on the interwebs☆43Updated 3 weeks ago
- ☆22Updated 3 months ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆33Updated last year
- ☆42Updated 3 weeks ago
- Experiment of using Tangent to autodiff triton☆66Updated 7 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆36Updated 8 months ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆43Updated last year
- JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)☆28Updated 4 months ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆50Updated 10 months ago
- A fast implementation of T5/UL2 in PyTorch using Flash Attention☆60Updated this week
- ☆48Updated 6 months ago
- Large Scale Distributed Model Training strategy with Colossal AI and Lightning AI☆58Updated last year