A kernel library written in tilelang
☆1,605Apr 23, 2026Updated 2 months ago
Alternatives and similar repositories for TileKernels
Users that are interested in TileKernels are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆37Aug 7, 2025Updated 10 months ago
- Exploring how optimizations for GEMMs work☆36Feb 28, 2026Updated 4 months ago
- Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity (ACL 2025, oral)☆34Jun 14, 2025Updated last year
- Tile-based language built for AI computation across all scales☆170Jun 16, 2026Updated last week
- Re-implementation of VertexRegen [ICCV 25]☆41Jan 25, 2026Updated 5 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels☆6,552Updated this week
- ☆13Jul 24, 2024Updated last year
- Quantize transformers to any learned arbitrary 4-bit numeric format☆59Apr 13, 2026Updated 2 months ago
- ☆17Oct 15, 2023Updated 2 years ago
- [2026 CVPR]Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation☆111Apr 15, 2026Updated 2 months ago
- High-performance LLM operator library built on TileLang.☆148Updated this week
- project website for "depth sensing beyond LiDAR range"☆11Jul 28, 2020Updated 5 years ago
- AI model training on heterogeneous, geo-distributed resources☆44Nov 24, 2025Updated 7 months ago
- a simple API to use CUPTI☆10Aug 19, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- This is an official GitHub repository for the paper, "Towards timeout-less transport in commodity datacenter networks.".☆17Oct 12, 2021Updated 4 years ago
- Code accompanying the NeurIPS 2019 paper AutoAssist: A Framework to Accelerate Training of Deep Neural Networks.☆14Oct 3, 2022Updated 3 years ago
- A PyTorch native platform for training generative AI models☆17Apr 21, 2026Updated 2 months ago
- The goal of this design is to use the PYNQ-Z2 development board to design a general convolution neural network accelerator. And through r…☆11Sep 30, 2020Updated 5 years ago
- ☆52Jul 31, 2025Updated 10 months ago
- Work related to vectorizing strategies for arbitrary FHE programs☆10Sep 5, 2025Updated 9 months ago
- [Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…☆16Jan 16, 2026Updated 5 months ago
- Search engine base (crawler, indexer and parser) using Python, Celery, RabbitMQ, CouchDB and Whoosh.☆10Jun 10, 2025Updated last year
- Open-source implementation of the CUDA API.☆13May 5, 2012Updated 14 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆33Sep 1, 2025Updated 9 months ago
- ☆12May 24, 2022Updated 4 years ago
- A NCCL extension library, designed to efficiently offload GPU memory allocated by the NCCL communication library.☆111Dec 17, 2025Updated 6 months ago
- ☆15Dec 5, 2024Updated last year
- ☆13Sep 19, 2024Updated last year
- ☆201Updated this week
- This tool displays tflite signatures and rewrites the input/output OP name to the name of the signature. There is no need to install Tens…☆14Dec 13, 2023Updated 2 years ago
- Asynchronous pipeline parallel optimization☆22Feb 2, 2026Updated 4 months ago
- eBPF for GPU UVM offloading and scheduling in Linux kernel☆59Apr 15, 2026Updated 2 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 💻 SETA: Scaling Environments for Terminal Agents☆113Feb 16, 2026Updated 4 months ago
- Implementation for ACProp ( Momentum centering and asynchronous update for adaptive gradient methdos, NeurIPS 2021)☆17Oct 11, 2021Updated 4 years ago
- a size profiler for cuda binary☆69Jan 15, 2026Updated 5 months ago
- ☆48Nov 1, 2025Updated 7 months ago
- ☆17Dec 19, 2024Updated last year
- A test case for VFIO_PLATFORM currently based on the PL330 DMA controller. The effort on VFIO_PLATFORM has been partially funded by the S…☆13Dec 12, 2022Updated 3 years ago
- MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models☆27Apr 2, 2026Updated 2 months ago