ShishirPatil / poet
ML model training for edge devices
☆157Updated last year
Related projects ⓘ
Alternatives and complementary repositories for poet
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆146Updated this week
- ☆237Updated 3 months ago
- A schedule language for large model training☆141Updated 5 months ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆181Updated last year
- Latency and Memory Analysis of Transformer Models for Training and Inference☆356Updated last week
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆55Updated this week
- Memory Optimizations for Deep Learning (ICML 2023)☆60Updated 8 months ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆112Updated 8 months ago
- AI and Memory Wall☆206Updated 8 months ago
- Deep Learning Energy Measurement and Optimization☆218Updated this week
- Applied AI experiments and examples for PyTorch☆168Updated 3 weeks ago
- PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021☆55Updated 3 years ago
- ☆88Updated 2 months ago
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆131Updated last year
- This repository contains integer operators on GPUs for PyTorch.☆184Updated last year
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆350Updated 8 months ago
- Official code for "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"☆129Updated 11 months ago
- Reorder-based post-training quantization for large language model☆181Updated last year
- ☆140Updated last year
- ☆156Updated last year
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆278Updated 4 months ago
- ☆134Updated last year
- Fast Inference of MoE Models with CPU-GPU Orchestration☆173Updated this week
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.☆43Updated last year
- Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".☆104Updated last year
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆211Updated 3 weeks ago
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆34Updated 2 years ago
- Research and development for optimizing transformers☆125Updated 3 years ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆149Updated 4 months ago
- ☆91Updated 2 years ago