IntelLabs / DyNAS-T
Dynamic Neural Architecture Search Toolkit
☆29Updated 3 months ago
Alternatives and similar repositories for DyNAS-T:
Users that are interested in DyNAS-T are comparing it to the libraries listed below
- ☆31Updated 9 months ago
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- ACL 2023☆39Updated last year
- DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training (ICLR 2023)☆30Updated last year
- ☆20Updated 2 years ago
- Spartan is an algorithm for training sparse neural network models. This repository accompanies the paper "Spartan Differentiable Sparsity…☆24Updated 2 years ago
- Latest Weight Averaging (NeurIPS HITY 2022)☆28Updated last year
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆23Updated last month
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated last year
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆18Updated last year
- Official PyTorch implementation of LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification☆46Updated 2 years ago
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.☆48Updated last year
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆22Updated 9 months ago
- Compression schema for gradients of activations in backward pass☆44Updated last year
- ☆51Updated 9 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆30Updated 9 months ago
- ☆23Updated 8 months ago
- [Preprint] Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Prunin…☆40Updated 2 years ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated last year
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆46Updated last year
- A block oriented training approach for inference time optimization.☆32Updated 7 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 8 months ago
- [NeurIPS 2022 Spotlight] This is the official PyTorch implementation of "EcoFormer: Energy-Saving Attention with Linear Complexity"☆71Updated 2 years ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆68Updated 9 months ago
- ☆29Updated 2 years ago
- ☆46Updated last week
- Hacks for PyTorch☆19Updated last year
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- ☆19Updated 3 years ago