CerebrasResearch / Sparse-IFT
Official repository of Sparse ISO-FLOP Transformations for Maximizing Training Efficiency
☆25Updated 7 months ago
Alternatives and similar repositories for Sparse-IFT:
Users that are interested in Sparse-IFT are comparing it to the libraries listed below
- Experiment of using Tangent to autodiff triton☆76Updated last year
- ☆125Updated last year
- Code for studying the super weight in LLM☆91Updated 3 months ago
- Understand and test language model architectures on synthetic tasks.☆183Updated last week
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆121Updated 7 months ago
- ☆49Updated 4 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆107Updated 2 months ago
- Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…☆53Updated last year
- ☆37Updated 11 months ago
- Language models scale reliably with over-training and on downstream tasks☆96Updated 11 months ago
- PyTorch library for Active Fine-Tuning☆59Updated 3 weeks ago
- Proof-of-concept of global switching between numpy/jax/pytorch in a library.☆18Updated 8 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆26Updated 5 months ago
- Triton Implementation of HyperAttention Algorithm☆47Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆121Updated 6 months ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆63Updated 7 months ago
- NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference☆64Updated 3 months ago
- Collection of autoregressive model implementation☆83Updated last month
- See https://github.com/cuda-mode/triton-index/ instead!☆11Updated 10 months ago
- ☆73Updated 10 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆59Updated 5 months ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆231Updated 2 weeks ago
- Make triton easier☆47Updated 9 months ago
- ☆51Updated 9 months ago
- Token Omission Via Attention☆124Updated 5 months ago
- ☆75Updated 8 months ago
- Experiments for efforts to train a new and improved t5☆77Updated 11 months ago
- ☆73Updated 6 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆207Updated 3 months ago