ShishirPatil / poet
ML model training for edge devices
β159Updated last year
Alternatives and similar repositories for poet:
Users that are interested in poet are comparing it to the libraries listed below
- A schedule language for large model trainingβ143Updated 7 months ago
- π Interactive performance profiling and debugging tool for PyTorch neural networks.β57Updated this week
- A library to analyze PyTorch traces.β324Updated last month
- β244Updated 5 months ago
- β96Updated 4 months ago
- PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021β55Updated 3 years ago
- β92Updated 2 years ago
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.β47Updated last year
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsityβ195Updated last year
- AI and Memory Wallβ211Updated 9 months ago
- A Python library transfers PyTorch tensors between CPU and NVMeβ102Updated last month
- Training neural networks in TensorFlow 2.0 with 5x less memoryβ130Updated 2 years ago
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.β103Updated last month
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.β37Updated 2 years ago
- Memory Optimizations for Deep Learning (ICML 2023)β62Updated 10 months ago
- A safetensors extension to efficiently store sparse quantized tensors on diskβ64Updated this week
- β140Updated last year
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Servingβ291Updated 6 months ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindβ¦β152Updated last month
- SparseTIR: Sparse Tensor Compiler for Deep Learningβ133Updated last year
- β43Updated 6 months ago
- Home for OctoML PyTorch Profilerβ107Updated last year
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large β¦β63Updated 2 years ago
- Reorder-based post-training quantization for large language modelβ183Updated last year
- Simple Distributed Deep Learning on TensorFlowβ134Updated 2 years ago
- Official code for "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"β133Updated last year
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantizationβ667Updated 5 months ago
- β157Updated last year
- β394Updated 3 months ago
- β141Updated last week