Cute layout visualization
☆40Jan 18, 2026Updated 5 months ago
Alternatives and similar repositories for cute-viz
Users that are interested in cute-viz are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆30Jan 22, 2026Updated 4 months ago
- Expert Specialization MoE Solution based on CUTLASS☆27Apr 14, 2026Updated 2 months ago
- ☆12Jan 4, 2024Updated 2 years ago
- DeeperGEMM: crazy optimized version☆86May 5, 2025Updated last year
- ☆14Nov 3, 2025Updated 7 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- a size profiler for cuda binary☆70Jan 15, 2026Updated 5 months ago
- ☆32Jul 2, 2025Updated 11 months ago
- RWKV v5,v6 LoRA Trainer on Cuda and Rocm Platform. RWKV is a RNN with transformer-level LLM performance. It can be directly trained like …☆13Mar 24, 2024Updated 2 years ago
- Artifacts of EVT ASPLOS'24☆30Mar 6, 2024Updated 2 years ago
- ☆18Jan 1, 2023Updated 3 years ago
- ☆23Aug 20, 2025Updated 9 months ago
- ☆80Feb 5, 2026Updated 4 months ago
- Fastest kernels written from scratch☆583Sep 18, 2025Updated 9 months ago
- ☆14Nov 26, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Asynchronous pipeline parallel optimization☆22Feb 2, 2026Updated 4 months ago
- A CUDA kernel for NHWC GroupNorm for PyTorch☆23Nov 15, 2024Updated last year
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆156May 10, 2025Updated last year
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆32Dec 21, 2024Updated last year
- Unofficial PyTorch reproduction of DeepSeek's Thinking with Visual Primitives.☆107Jun 11, 2026Updated last week
- The official implementation for the intra-stage fusion technique introduced in https://arxiv.org/abs/2409.13221☆31Apr 22, 2025Updated last year
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆193Feb 11, 2026Updated 4 months ago
- Boosting GPU utilization for LLM serving via dynamic spatial-temporal prefill & decode orchestration☆51Jan 8, 2026Updated 5 months ago
- Supporting code for "LLMs for your iPhone: Whole-Tensor 4 Bit Quantization"☆11Mar 31, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- A practical way of learning Swizzle☆41Feb 3, 2025Updated last year
- The code of "Learning Crisp Boundaries Using Deep Refinement Network and Adaptive Weighting Loss"☆12Feb 1, 2021Updated 5 years ago
- ☆23Sep 9, 2024Updated last year
- A Triton-only attention backend for vLLM☆26Mar 17, 2026Updated 3 months ago
- my solution for UC Berkeley AI projects pacman☆11Jul 25, 2020Updated 5 years ago
- Accelerating MoE with IO and Tile-aware Optimizations☆714Updated this week
- ☆11Feb 13, 2025Updated last year
- ☆13Nov 27, 2025Updated 6 months ago
- [Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…☆16Jan 16, 2026Updated 5 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆18Apr 30, 2025Updated last year
- amdgpu example code in hip/asm☆64Jun 3, 2026Updated 2 weeks ago
- Experiments on Multi-Head Latent Attention☆101Aug 19, 2024Updated last year
- ☆108May 31, 2025Updated last year
- Graph model execution API for Candle☆18Jul 27, 2025Updated 10 months ago
- A PyTorch-Based GPU Parallel Env for IPPS Problem, supporting DRL, IL and Learning Guided MCTS.☆17May 25, 2026Updated 3 weeks ago
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆27Updated this week