ShishirPatil / poetLinks

ML model training for edge devices

☆167

Alternatives and similar repositories for poet

Users that are interested in poet are comparing it to the libraries listed below

Sorting:

mlc-ai / llm-perf-bench
☆120Updated last year
haochengxi / Train_Transformers_with_INT4
☆157Updated 2 years ago
lightmatter-ai / INT-FP-QSim
Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.
☆51Updated 2 years ago
CentML / DeepView.Profile
🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.
☆64Updated 9 months ago
Cornell-RelaxML / QuIP
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆387Updated last year
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
☆158Updated 2 years ago
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆156Updated 2 years ago
octoml / octoml-profile
Home for OctoML PyTorch Profiler
☆114Updated 2 years ago
DS3Lab / DT-FM
☆93Updated 3 years ago
stanford-futuredata / stk
☆113Updated last year
hahnyuan / RPTQ4LLM
Reorder-based post-training quantization for large language model
☆194Updated 2 years ago
efeslab / Atom
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
☆328Updated last year
GreenBitAI / low_bit_llama
Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs
☆110Updated last year
awslabs / slapo
A schedule language for large model training
☆151Updated 2 months ago
SqueezeBits / QUICK
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
☆118Updated last year
yandex-research / swarm
Official code for "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"
☆145Updated last year
efeslab / fiddler
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
☆240Updated last year
hpcaitech / TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
☆121Updated 11 months ago
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆216Updated last week
fpgaminer / GPTQ-triton
GPTQ inference Triton kernel
☆313Updated 2 years ago
deepspeedai / DeepSpeed-Kernels
☆71Updated 7 months ago
IntelLabs / FP8-Emulation-Toolkit
PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.
☆112Updated 11 months ago
IST-DASLab / QUIK
Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024
☆183Updated last year
DS3Lab / CocktailSGD
☆27Updated 2 years ago
Distributed-AI / PipeTransformer
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021
☆56Updated 4 years ago
Raincleared-Song / sparse_gpu_operator
GPU operators for sparse tensor operations
☆35Updated last year
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆277Updated 2 years ago
Macaronlin / LLaMA3-Quantization
A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..
☆196Updated 10 months ago
AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆223Updated 2 years ago
facebookresearch / MODel_opt
Memory Optimizations for Deep Learning (ICML 2023)
☆110Updated last year