IntelLabs / DyNAS-T
Dynamic Neural Architecture Search Toolkit
☆29Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for DyNAS-T
- ☆30Updated 4 months ago
- ☆18Updated 3 years ago
- Official PyTorch implementation of LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification☆45Updated 2 years ago
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.☆43Updated last year
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆43Updated last year
- [Preprint] Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Prunin…☆40Updated last year
- DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training (ICLR 2023)☆30Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆38Updated 9 months ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆107Updated 5 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆59Updated 7 months ago
- A block oriented training approach for inference time optimization.☆29Updated 2 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆85Updated 3 weeks ago
- ☆20Updated 2 years ago
- [ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning☆19Updated 2 years ago
- Prototype routines for GPU quantization written using PyTorch.☆19Updated last week
- Repository for CPU Kernel Generation for LLM Inference☆24Updated last year
- ☆49Updated last year
- Patch convolution to avoid large GPU memory usage of Conv2D☆79Updated 5 months ago
- ☆21Updated 3 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆35Updated 3 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆46Updated this week
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- ACL 2023☆38Updated last year
- ☆50Updated 4 months ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆19Updated 7 months ago
- ☆17Updated 3 months ago
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆20Updated 7 months ago
- ☆24Updated 2 years ago
- Experiment of using Tangent to autodiff triton☆71Updated 9 months ago
- A library for unit scaling in PyTorch☆105Updated this week