☆91Feb 29, 2024Updated 2 years ago
Alternatives and similar repositories for inference-optimization-blog-post
Users that are interested in inference-optimization-blog-post are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆12Jan 4, 2024Updated 2 years ago
- ☆15Mar 30, 2024Updated 2 years ago
- ring-attention experiments☆165Oct 17, 2024Updated last year
- ModernBERT model optimized for Apple Neural Engine.☆31Jan 10, 2025Updated last year
- Explore training for quantized models☆26Jul 12, 2025Updated 9 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆23Apr 7, 2026Updated last week
- ☆93Nov 11, 2025Updated 5 months ago
- Experimental GPU language with meta-programming☆27Sep 6, 2024Updated last year
- ☆19Dec 31, 2025Updated 3 months ago
- Mixed precision training from scratch with Tensors and CUDA☆29May 14, 2024Updated last year
- ☆92Jul 5, 2024Updated last year
- Personal solutions to the Triton Puzzles☆20Jul 18, 2024Updated last year
- ☆12Jun 27, 2024Updated last year
- Training Models Daily☆16Dec 19, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- GPU programming related news and material links☆2,093Mar 8, 2026Updated last month
- minimal C implementation of speculative decoding based on llama2.c☆29Jul 15, 2024Updated last year
- Continual Multi-agent Reinforcement Learning in Dynamic Environments☆11Jul 1, 2021Updated 4 years ago
- Standalone commandline CLI tool for compiling Triton kernels☆20Sep 13, 2024Updated last year
- Train small sequence models in your browser with WebGPU.☆34Dec 3, 2025Updated 4 months ago
- Quantized LLM training in pure CUDA/C++.☆244Mar 6, 2026Updated last month
- ☆12Dec 22, 2024Updated last year
- ☆17Updated this week
- Cute layout visualization☆32Jan 18, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆39Oct 3, 2022Updated 3 years ago
- Apply GPU in ML and DL☆67Mar 23, 2026Updated 3 weeks ago
- ☆18Feb 19, 2023Updated 3 years ago
- Using multiple LLMs for ensemble Forecasting☆16Jan 17, 2024Updated 2 years ago
- Puzzles for learning Triton☆2,359Apr 1, 2026Updated 2 weeks ago
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆21Jan 24, 2025Updated last year
- Repository for the EuroSciPy sprint☆14May 21, 2023Updated 2 years ago
- [ICLR'24] Symphony: Symmetry-Equivariant Point-Centered Spherical Harmonics for Molecule Generation☆30Feb 24, 2025Updated last year
- Experimental compiler for deep learning models☆75Sep 18, 2025Updated 6 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Fast CUDA matrix multiplication from scratch☆1,127Sep 2, 2025Updated 7 months ago
- Applied AI experiments and examples for PyTorch☆320Aug 22, 2025Updated 7 months ago
- ☆146Apr 4, 2026Updated last week
- super-resolution; post-training quantization; model compression☆14Nov 10, 2023Updated 2 years ago
- Flash Attention in ~100 lines of CUDA (forward pass only)☆1,118Dec 30, 2024Updated last year
- Hacks for PyTorch☆19Apr 18, 2023Updated 2 years ago
- ☆22Apr 22, 2024Updated last year