google-research-datasets / tpu_graphs
☆123Updated 9 months ago
Alternatives and similar repositories for tpu_graphs:
Users that are interested in tpu_graphs are comparing it to the libraries listed below
- Collection of kernels written in Triton language☆114Updated last month
- ☆23Updated last year
- Cataloging released Triton kernels.☆208Updated 2 months ago
- ☆101Updated 7 months ago
- ☆192Updated this week
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated last year
- A schedule language for large model training☆145Updated 9 months ago
- ☆81Updated last year
- ring-attention experiments☆128Updated 5 months ago
- ☆73Updated 4 months ago
- An experimentation platform for LLM inference optimisation☆29Updated 6 months ago
- This repository contains the experimental PyTorch native float8 training UX☆222Updated 7 months ago
- Understand and test language model architectures on synthetic tasks.☆185Updated 3 weeks ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆155Updated 3 months ago
- Fast low-bit matmul kernels in Triton☆272Updated this week
- Explorations into some recent techniques surrounding speculative decoding☆250Updated 3 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆237Updated last week
- PyTorch library for cost-effective, fast and easy serving of MoE models.☆151Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆34Updated this week
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆165Updated 3 months ago
- Code for studying the super weight in LLM☆94Updated 3 months ago
- extensible collectives library in triton☆84Updated 6 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆208Updated 4 months ago
- Implementation of a Transformer, but completely in Triton☆261Updated 2 years ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆189Updated last week
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆202Updated last year
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…☆57Updated last week
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆201Updated 4 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆127Updated last year
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆107Updated 3 months ago