Agentic Kernel Optimization for All — automated GPU kernel optimization for any kernel, any hardware, any language
☆285May 31, 2026Updated 2 weeks ago
Alternatives and similar repositories for AKO4ALL
Users that are interested in AKO4ALL are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆48Nov 1, 2025Updated 7 months ago
- Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.☆13Nov 3, 2023Updated 2 years ago
- Import-export medial meshes in Blender☆13Sep 11, 2024Updated last year
- ☆14Jun 30, 2021Updated 4 years ago
- Scaling Laws for Mixture of Experts Models☆15Feb 25, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆157Mar 5, 2026Updated 3 months ago
- Utility that parses stack sizes section from elf objects and displays the preallocated stack size of each function.☆14Jan 15, 2020Updated 6 years ago
- A library for metric optimization and parametrization described in Optimization in Penner Coordinates☆22Jun 5, 2026Updated last week
- A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs☆15Dec 17, 2024Updated last year
- Official implementation of the paper: "A deeper look at depth pruning of LLMs"☆15Jul 24, 2024Updated last year
- This repo contains the benchmarks for Enzyme on GPU's☆11May 28, 2026Updated 2 weeks ago
- The Shifted and The Overlooked: A Task-oriented Investigation of User-GPT Interactions (EMNLP 2023))☆13Dec 21, 2023Updated 2 years ago
- Experiments evaluating preemption on the NVIDIA Pascal architecture☆16Nov 10, 2016Updated 9 years ago
- ☆17Aug 9, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Cavs: An Efficient Runtime System for Dynamic Neural Networks☆15Sep 18, 2020Updated 5 years ago
- 华彩人生 -- 一款为华科(华中科技大学)量身打造的游戏;一款在华科这片神奇土地上展开的奇幻RPG游戏;一款高仿真的华科模拟器。想要了解华科的同学请务必来玩哦~相信不会让你失望! 引擎:RPGMaker MV☆15Jun 19, 2018Updated 7 years ago
- [ICML2023] Instant Soup Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models. Ajay Jaiswal, Shiwei Liu, Ti…☆11Nov 28, 2023Updated 2 years ago
- [ACL2025 Oral🔥]Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling☆29Nov 11, 2025Updated 7 months ago
- Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.☆14Feb 8, 2023Updated 3 years ago
- Convert CUDA programs from float data type to half or half2 with SIMDization☆19May 28, 2019Updated 7 years ago
- [OSDI 2025] DecDEC: A Systems Approach to Advancing Low‑Bit LLM Quantization☆24Jan 29, 2026Updated 4 months ago
- ☆18Mar 12, 2025Updated last year
- An Open Source Kepler GPU Assembler☆21Jan 23, 2017Updated 9 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆13Aug 31, 2023Updated 2 years ago
- ☆13Apr 12, 2026Updated 2 months ago
- CUTLASS and CuTe Examples☆136Nov 30, 2025Updated 6 months ago
- A CUDA kernel optimization toolkit for validation, benchmarking, Nsight Compute profiling, bottleneck analysis, and iterative tuning. It …☆177Apr 22, 2026Updated last month
- MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models☆28Apr 2, 2026Updated 2 months ago
- ☆135Updated this week
- ☆80Feb 5, 2026Updated 4 months ago
- Work related to vectorizing strategies for arbitrary FHE programs☆10Sep 5, 2025Updated 9 months ago
- Repository for our CVPR 2021 Global Flow Transport Paper☆20Jun 24, 2021Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆17Sep 27, 2022Updated 3 years ago
- ☆12Apr 30, 2024Updated 2 years ago
- Step-by-step optimization of CUDA SGEMM☆475Mar 30, 2022Updated 4 years ago
- HUST-CS-2019 硬件综合训练-组原课设-riscv实现☆16Nov 3, 2022Updated 3 years ago
- ☆17Updated this week
- Ongoing research training transformer models at scale☆18Jul 27, 2023Updated 2 years ago
- Repository for answers for exercises in Programming Massively Parallel Processors book☆16Aug 10, 2024Updated last year