ShishirPatil / poetLinks
ML model training for edge devices
β165Updated last year
Alternatives and similar repositories for poet
Users that are interested in poet are comparing it to the libraries listed below
Sorting:
- β157Updated last year
- π Interactive performance profiling and debugging tool for PyTorch neural networks.β64Updated 6 months ago
- Home for OctoML PyTorch Profilerβ113Updated 2 years ago
- β153Updated 2 years ago
- β106Updated 10 months ago
- β120Updated last year
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.β110Updated 7 months ago
- β94Updated 3 years ago
- GPTQ inference Triton kernelβ302Updated 2 years ago
- β74Updated 3 months ago
- This repository contains the experimental PyTorch native float8 training UXβ224Updated 11 months ago
- A Python library transfers PyTorch tensors between CPU and NVMeβ117Updated 7 months ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"β373Updated last year
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsityβ216Updated last year
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindβ¦β158Updated 3 weeks ago
- The official implementation of the EMNLP 2023 paper LLM-FP4β210Updated last year
- Training neural networks in TensorFlow 2.0 with 5x less memoryβ132Updated 3 years ago
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Servingβ314Updated last year
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β207Updated this week
- PB-LLM: Partially Binarized Large Language Modelsβ152Updated last year
- β251Updated 11 months ago
- β120Updated last year
- Reorder-based post-training quantization for large language modelβ192Updated 2 years ago
- AI and Memory Wallβ216Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".β277Updated last year
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestrationβ218Updated 8 months ago
- β152Updated 2 years ago
- Memory Optimizations for Deep Learning (ICML 2023)β98Updated last year
- PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021β56Updated 4 years ago
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.β51Updated 2 years ago