openxla / triton
Fork of Triton repository for OpenXLA uses of the Triton language and compiler
☆10Updated this week
Related projects: ⓘ
- ☆22Updated this week
- Official repository of Sparse ISO-FLOP Transformations for Maximizing Training Efficiency☆23Updated last month
- asynchronous/distributed speculative evaluation for llama3☆36Updated last month
- Attention in SRAM on Tenstorrent Grayskull☆22Updated 2 months ago
- Loop Nest - Linear algebra compiler and code generator.☆22Updated last year
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆94Updated this week
- Simple and fast low-bit matmul kernels in CUDA☆48Updated this week
- ☆17Updated last week
- tenstorrent kernel from twitch☆26Updated 6 months ago
- minimal C implementation of speculative decoding based on llama2.c☆16Updated 2 months ago
- Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.☆13Updated 11 months ago
- SynapseAI Core is a reference implementation of the SynapseAI API running on Habana Gaudi☆36Updated last year
- JAX implementations of RWKV☆18Updated 11 months ago
- Torch Frontend for IREE☆25Updated 8 months ago
- A tracing JIT compiler for PyTorch☆12Updated 2 years ago
- MACTA: A Multi-agent Reinforcement Learning Approach for Cache Timing Attacks and Detection☆45Updated last year
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆30Updated 4 months ago
- Unified compiler/runtime for interfacing with PyTorch Dynamo.☆90Updated this week
- A server powering LAION's effort to filter CommonCrawl with CLIP, building a large scale image-text dataset.☆13Updated last year
- Better bindings for Python☆17Updated last year
- Heavyweight Python dynamic analysis framework☆12Updated 5 months ago
- Personal solutions to the Triton Puzzles☆11Updated 2 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆34Updated 2 months ago
- Repository of model demos using TT-Buda☆54Updated this week
- benchmarking some transformer deployments☆26Updated last year
- LLama implementations benchmarking framework☆10Updated 10 months ago
- RDNA3 emulator☆43Updated last week
- Course Project for COMP4471 on RWKV☆16Updated 7 months ago
- hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditiona…☆48Updated this week
- Tenstorrent MLIR compiler☆52Updated this week