Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS
☆33Feb 10, 2025Updated last year
Alternatives and similar repositories for Tacker
Users that are interested in Tacker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- libsmctrl论文的复现,添加了python端接口,可以在python端灵活调用接口来分配计算资源☆12May 21, 2024Updated 2 years ago
- ☆19Mar 4, 2025Updated last year
- ☆12May 24, 2022Updated 4 years ago
- Implementation of Hyena Hierarchy in JAX☆10Apr 30, 2023Updated 3 years ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Jun 21, 2019Updated 7 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆29Updated this week
- [NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive☆74Dec 11, 2025Updated 6 months ago
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆57May 29, 2024Updated 2 years ago
- FTPipe and related pipeline model parallelism research.☆44May 16, 2023Updated 3 years ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Jun 25, 2026Updated last week
- Tutorials of Extending and importing TVM with CMAKE Include dependency.☆16Oct 11, 2024Updated last year
- ☆122Nov 17, 2023Updated 2 years ago
- ☆17Jan 24, 2024Updated 2 years ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆13Nov 23, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆38Jun 27, 2025Updated last year
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆72May 1, 2024Updated 2 years ago
- ☆20Sep 28, 2024Updated last year
- POC implementation of "Accelerating HE Operations Using Key Decomposition"[KLSS23]☆19Jun 11, 2025Updated last year
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19May 12, 2024Updated 2 years ago
- ☆83Jun 23, 2025Updated last year
- Handwritten GEMM using Intel AMX (Advanced Matrix Extension)☆17Jan 11, 2025Updated last year
- ☆26Feb 20, 2024Updated 2 years ago
- Depict GPU memory footprint during DNN training of PyTorch☆11Nov 17, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆78May 4, 2021Updated 5 years ago
- PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications☆127May 9, 2022Updated 4 years ago
- ☆20Aug 26, 2021Updated 4 years ago
- Horizontal Fusion☆24Jan 7, 2022Updated 4 years ago
- Artifact for PPoPP22 QGTC: Accelerating Quantized GNN via GPU Tensor Core.☆30Feb 12, 2022Updated 4 years ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆112Jun 28, 2025Updated last year
- ngAP's artifact for ASPLOS'24☆25Jul 29, 2025Updated 11 months ago
- Open-source Framework for HPCA2024 paper: Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators☆115Apr 28, 2025Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.☆101Sep 19, 2025Updated 9 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Tahoe: Tree Structure-Aware High Performance Inference Engine for Decision Tree Ensemble on GPU☆22Aug 29, 2024Updated last year
- Linux docker for the DNN accelerator exploration infrastructure composed of Accelergy and Timeloop☆68Oct 14, 2025Updated 8 months ago
- Artifacts for our ASPLOS'23 paper ElasticFlow☆56May 10, 2024Updated 2 years ago
- ☆11Apr 16, 2023Updated 3 years ago
- ☆13Nov 1, 2021Updated 4 years ago
- ☆32Jul 17, 2024Updated last year
- Create tiny ML systems for on-device learning.☆19Jul 14, 2021Updated 4 years ago