vdesai2014/inference-optimization-blog-post

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/vdesai2014/inference-optimization-blog-post)

vdesai2014 / inference-optimization-blog-post

☆92

Alternatives and similar repositories for inference-optimization-blog-post

Users that are interested in inference-optimization-blog-post are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

huggingface / candle-paged-attention
View on GitHub
☆12Jan 4, 2024Updated 2 years ago
nicksypark / rope-triton
View on GitHub
☆15Mar 30, 2024Updated 2 years ago
FL33TW00D / wattkit
View on GitHub
☆15Dec 4, 2024Updated last year
gpu-mode / ring-attention
View on GitHub
ring-attention experiments
☆171Oct 17, 2024Updated last year
StuartSul / gpu-experiments
View on GitHub
A collection of GPU experiments and benchmarks for my personal understanding and research.
☆34Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
gpu-mode / triton-index
View on GitHub
Cataloging released Triton kernels.
☆310Sep 9, 2025Updated 10 months ago
gau-nernst / quantized-training
View on GitHub
Explore training for quantized models
☆26Jul 12, 2025Updated last year
huggingface / hf-rocm-kernels
View on GitHub
☆24May 26, 2026Updated last month
SzymonOzog / GPU_Programming
View on GitHub
☆98May 30, 2026Updated last month
tspeterkim / mixed-precision-from-scratch
View on GitHub
Mixed precision training from scratch with Tensors and CUDA
☆30May 14, 2024Updated 2 years ago
FL33TW00D / deCoreML
View on GitHub
Find out why your CoreML model isn't running on the Neural Engine!
☆30Jun 18, 2024Updated 2 years ago
cloneofsimo / min-fsdp
View on GitHub
☆93Jul 5, 2024Updated 2 years ago
FL33TW00D / wgpu-bench
View on GitHub
☆12Jun 27, 2024Updated 2 years ago
smpanaro / ModernBERT-AppleNeuralEngine
View on GitHub
ModernBERT model optimized for Apple Neural Engine.
☆38Jan 10, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
gpu-mode / resource-stream
View on GitHub
GPU programming related news and material links
☆2,239Jun 15, 2026Updated last month
mikex86 / tritonc
View on GitHub
Standalone commandline CLI tool for compiling Triton kernels
☆20Sep 13, 2024Updated last year
prescient-design / e3tools
View on GitHub
Building Blocks for Equivariant Neural Networks in e3nn and PyTorch 2.0
☆20Nov 16, 2025Updated 8 months ago
Algomancer / The-Daily-Train
View on GitHub
Training Models Daily
☆16Dec 19, 2023Updated 2 years ago
kuterd / opal_ptx
View on GitHub
Experimental GPU language with meta-programming
☆31Sep 6, 2024Updated last year
FL33TW00D / coremlprofiler
View on GitHub
Profile your CoreML models directly from Python 🐍
☆29Sep 8, 2025Updated 10 months ago
jk96491 / C-COMA
View on GitHub
Continual Multi-agent Reinforcement Learning in Dynamic Environments
☆11Jul 1, 2021Updated 5 years ago
smpanaro / apple-silicon-4bit-quant
View on GitHub
Supporting code for "LLMs for your iPhone: Whole-Tensor 4 Bit Quantization"
☆11Mar 31, 2024Updated 2 years ago
EugenHotaj / llm_parallelisms.c
View on GitHub
LLM training parallelisms (DP, FSDP, TP, PP) in pure C
☆29Jan 27, 2026Updated 5 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
alexzhang13 / Triton-Puzzles-Solutions
View on GitHub
Personal solutions to the Triton Puzzles
☆22Jul 18, 2024Updated 2 years ago
prescient-design / jamun
View on GitHub
Bridging Smoothed Molecular Dynamics and Score-Based Learning for Conformational Ensembles
☆20May 16, 2026Updated 2 months ago
huggingface / kernel-builder
View on GitHub
👷 Build compute kernels
☆213Apr 6, 2026Updated 3 months ago
atomicarchitects / PriceofFreedom
View on GitHub
[ICML'25] The Price of Freedom: Exploring Expressivity and Runtime Tradeoffs in Equivariant Tensor Products
☆19Jul 16, 2025Updated last year
HuyNguyen-hust / hopper-gemm-101
View on GitHub
☆13Dec 22, 2024Updated last year
ThoenigAdrian / NeuralNetworksCudaTutorial
View on GitHub
Implement Neural Networks in Cuda from Scratch
☆23May 17, 2024Updated 2 years ago
daniel-geon-park / triton_bwd
View on GitHub
Automatic differentiation for Triton Kernels
☆29Aug 12, 2025Updated 11 months ago
siboehm / SGEMM_CUDA
View on GitHub
Fast CUDA matrix multiplication from scratch
☆1,262Sep 2, 2025Updated 10 months ago
YJMSTR / flash-linear-attention
View on GitHub
FLA but cuTile
☆27Apr 17, 2026Updated 3 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
gpu-mode / Triton-Puzzles
View on GitHub
Puzzles for learning Triton
☆2,537Apr 1, 2026Updated 3 months ago
yenchenlin / mira
View on GitHub
☆18Feb 19, 2023Updated 3 years ago
Narsil / bloomserver
View on GitHub
☆39Oct 3, 2022Updated 3 years ago
SebastianBodza / EnsembleForecasting
View on GitHub
Using multiple LLMs for ensemble Forecasting
☆16Jan 17, 2024Updated 2 years ago
diff-use / sampleworks
View on GitHub
Framework for modified sampling from biomolecular generative models
☆21Updated this week
huggingface / ember
View on GitHub
ANE accelerated embedding models!
☆20Dec 11, 2024Updated last year
rapidsai / deployment
View on GitHub
RAPIDS Deployment Documentation
☆15Updated this week