sustcsonglin / fla-tilelangView external linksLinks
☆35Mar 7, 2025Updated 11 months ago
Alternatives and similar repositories for fla-tilelang
Users that are interested in fla-tilelang are comparing it to the libraries listed below
Sorting:
- Learning TileLang with 10 puzzles!☆118Jan 30, 2026Updated 2 weeks ago
- ☆52May 19, 2025Updated 8 months ago
- ☆105Feb 25, 2025Updated 11 months ago
- ☆118May 19, 2025Updated 8 months ago
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 6 months ago
- A collection of specialized agent skills for AI infrastructure development, enabling Claude Code to write, optimize, and debug high-perfo…☆57Feb 2, 2026Updated last week
- ☆65Apr 26, 2025Updated 9 months ago
- ☆38Jul 19, 2025Updated 6 months ago
- ☆32Jul 2, 2025Updated 7 months ago
- NVIDIA cuTile learn☆158Dec 9, 2025Updated 2 months ago
- ☆41Oct 15, 2025Updated 3 months ago
- ☆38Aug 7, 2025Updated 6 months ago
- Xmixers: A collection of SOTA efficient token/channel mixers☆28Sep 4, 2025Updated 5 months ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆163Updated this week
- Tile-based language built for AI computation across all scales☆120Updated this week
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆233Jun 15, 2025Updated 7 months ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆148May 10, 2025Updated 9 months ago
- Building the Virtuous Cycle for AI-driven LLM Systems☆164Updated this week
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆137Dec 19, 2025Updated last month
- code for COLING paper "A Hybrid Model of Classification and Generation for Spatial Relation Extraction"☆10Oct 20, 2022Updated 3 years ago
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆964Feb 5, 2026Updated last week
- ☆85Jan 23, 2025Updated last year
- The official implementation of the ECCV 2024 paper: Continuity Preserving Online CenterLine Graph Learning☆34Dec 16, 2024Updated last year
- ☆44Nov 1, 2025Updated 3 months ago
- Distributed Compiler based on Triton for Parallel Systems☆1,350Updated this week
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators☆114Jun 14, 2025Updated 7 months ago
- my solution for UC Berkeley AI projects pacman☆11Jul 25, 2020Updated 5 years ago
- ☆175May 7, 2025Updated 9 months ago
- Tile primitives for speedy kernels☆3,139Updated this week
- cuTile is a programming model for writing parallel kernels for NVIDIA GPUs☆1,926Updated this week
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 4 months ago
- 面向多平台编译优化的深度学习中间表示☆10Oct 28, 2024Updated last year
- This repository is show how to calibrate camera and lidar, inlude camera intrinsics、camera and lidar`s extrinsics☆10Nov 28, 2021Updated 4 years ago
- ☆10Apr 7, 2025Updated 10 months ago
- ☆288Updated this week
- ☆66Jul 8, 2025Updated 7 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆51Jun 12, 2025Updated 8 months ago
- DiFSD: Ego-Centric Fully Sparse Paradigm for End-to-End Self-Driving☆14Mar 9, 2025Updated 11 months ago
- A Statistical Arbitrage Strategy to trade Cryptocurrency Pairs☆13Nov 6, 2020Updated 5 years ago