Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS
☆33Feb 10, 2025Updated last year
Alternatives and similar repositories for Tacker
Users that are interested in Tacker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- GoPTX: Fine-grained GPU Kernel Fusion by PTX-level Instruction Flow Weaving☆19Jul 30, 2025Updated 9 months ago
- libsmctrl论文的复现,添加了python端接口,可以在python端灵活调用接口来分配计算资源☆12May 21, 2024Updated last year
- ☆19Mar 4, 2025Updated last year
- ☆12May 24, 2022Updated 3 years ago
- Implementation of Hyena Hierarchy in JAX☆10Apr 30, 2023Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆22Updated this week
- [NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive☆68Dec 11, 2025Updated 4 months ago
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆57May 29, 2024Updated last year
- FTPipe and related pipeline model parallelism research.☆44May 16, 2023Updated 2 years ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Apr 24, 2026Updated last week
- Tutorials of Extending and importing TVM with CMAKE Include dependency.☆16Oct 11, 2024Updated last year
- ☆120Nov 17, 2023Updated 2 years ago
- ☆17Jan 24, 2024Updated 2 years ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆13Nov 23, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆38Jun 27, 2025Updated 10 months ago
- Jetson embedded platform-target deep learning inference acceleration framework with TensorRT☆30Oct 10, 2025Updated 6 months ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆71May 1, 2024Updated 2 years ago
- ☆20Sep 28, 2024Updated last year
- POC implementation of "Accelerating HE Operations Using Key Decomposition"[KLSS23]☆19Jun 11, 2025Updated 10 months ago
- Artifacts for SOSP'19 paper Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions☆21Apr 15, 2022Updated 4 years ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19May 12, 2024Updated last year
- ☆79Jun 23, 2025Updated 10 months ago
- Handwritten GEMM using Intel AMX (Advanced Matrix Extension)☆17Jan 11, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Depict GPU memory footprint during DNN training of PyTorch☆11Nov 17, 2022Updated 3 years ago
- ☆78May 4, 2021Updated 4 years ago
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications☆127May 9, 2022Updated 3 years ago
- Horizontal Fusion☆24Jan 7, 2022Updated 4 years ago
- ☆20Aug 26, 2021Updated 4 years ago
- Artifact for PPoPP22 QGTC: Accelerating Quantized GNN via GPU Tensor Core.☆30Feb 12, 2022Updated 4 years ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆108Jun 28, 2025Updated 10 months ago
- ngAP's artifact for ASPLOS'24☆25Jul 29, 2025Updated 9 months ago
- Open-source Framework for HPCA2024 paper: Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators☆112Apr 28, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Framework to reduce autotune overhead to zero for well known deployments.☆99Sep 19, 2025Updated 7 months ago
- Linux docker for the DNN accelerator exploration infrastructure composed of Accelergy and Timeloop☆67Oct 14, 2025Updated 6 months ago
- Artifacts for our ASPLOS'23 paper ElasticFlow☆56May 10, 2024Updated last year
- ☆11Apr 16, 2023Updated 3 years ago
- ☆13Nov 1, 2021Updated 4 years ago
- ☆32Jul 17, 2024Updated last year
- Create tiny ML systems for on-device learning.☆19Jul 14, 2021Updated 4 years ago