Mystery-Golden-Retriever / PipeEdgeLinks
PipeEdge: Pipeline Parallelism for Large-Scale Model Inference on Heterogeneous Edge Devices
☆16Updated 4 months ago
Alternatives and similar repositories for PipeEdge
Users that are interested in PipeEdge are comparing it to the libraries listed below
Sorting:
- GPU programming related news and material links☆1,625Updated 6 months ago
- ☆171Updated 11 months ago
- ☆1,276Updated 2 weeks ago
- Large Language Model (LLM) Systems Paper List☆1,362Updated last week
- Extra notebooks for ECE-GY 6143☆24Updated last week
- Fast CUDA matrix multiplication from scratch☆771Updated last year
- Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)☆816Updated 11 months ago
- 100 days of building GPU kernels!☆462Updated 2 months ago
- List of papers related to neural network quantization in recent AI conferences and journals.☆669Updated 3 months ago
- Puzzles for learning Triton☆1,760Updated 8 months ago
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…☆2,561Updated this week
- Curated collection of papers in machine learning systems☆384Updated last month
- Awesome LLM compression research papers and tools.☆1,603Updated 2 weeks ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆1,447Updated last year
- FlashInfer: Kernel Library for LLM Serving☆3,380Updated this week
- All Homeworks for TinyML and Efficient Deep Learning Computing 6.5940 • Fall • 2023 • https://efficientml.ai☆175Updated last year
- Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch☆845Updated 2 years ago
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆366Updated 6 months ago
- Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.☆442Updated 2 years ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆3,154Updated this week
- ☆601Updated 2 months ago
- Transparent Cudnn / Cublas / Eigen usage for the deep learning training using MNIST dataset.☆17Updated 4 years ago
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆439Updated 10 months ago
- Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training☆1,809Updated this week
- GEMM by WMMA (tensor core)☆13Updated 2 years ago
- TinyML and Efficient Deep Learning Computing☆13Updated last year
- A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. …☆1,052Updated this week
- An ML Systems Onboarding list☆840Updated 5 months ago
- ☆143Updated last year
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".☆2,143Updated last year