IntelLabs / DyNAS-T
Dynamic Neural Architecture Search Toolkit
☆29Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for DyNAS-T
- [Preprint] Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Prunin…☆40Updated last year
- ☆18Updated 3 years ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆87Updated last month
- Memory Optimizations for Deep Learning (ICML 2023)☆60Updated 8 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆38Updated 10 months ago
- A block oriented training approach for inference time optimization.☆30Updated 3 months ago
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.☆43Updated last year
- ☆32Updated this week
- ☆22Updated this week
- ☆11Updated 2 years ago
- Official PyTorch implementation of LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification☆46Updated 2 years ago
- ☆20Updated 2 years ago
- ☆30Updated 5 months ago
- The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"☆43Updated last year
- Code for studying the super weight in LLM☆16Updated last week
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆19Updated 8 months ago
- ☆24Updated 2 years ago
- ☆18Updated 4 months ago
- Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs☆74Updated 5 months ago
- LLM KV cache compression made easy☆64Updated last week
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆26Updated 5 months ago
- [ICML 2022] "Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets" by Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wa…☆31Updated last year
- ☆42Updated 9 months ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆43Updated last year
- DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training (ICLR 2023)☆30Updated last year
- Prototype routines for GPU quantization written using PyTorch.☆19Updated last week
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆79Updated last year
- ☆51Updated 5 months ago
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆49Updated last month
- Code for ICML 2021 submission☆35Updated 3 years ago