google-research-datasets / tpu_graphsLinks
☆125Updated 11 months ago
Alternatives and similar repositories for tpu_graphs
Users that are interested in tpu_graphs are comparing it to the libraries listed below
Sorting:
- Collection of kernels written in Triton language☆125Updated 2 months ago
- ☆81Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆62Updated 4 months ago
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆256Updated last year
- Triton-based implementation of Sparse Mixture of Experts.☆217Updated 6 months ago
- ☆144Updated 2 years ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆157Updated 6 months ago
- Hydragen: High-Throughput LLM Inference with Shared Prefixes☆36Updated last year
- Cataloging released Triton kernels.☆229Updated 4 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆133Updated last year
- Implementation of a Transformer, but completely in Triton☆266Updated 3 years ago
- ☆23Updated last year
- ring-attention experiments☆145Updated 7 months ago
- Fast low-bit matmul kernels in Triton☆311Updated this week
- Understand and test language model architectures on synthetic tasks.☆197Updated 3 months ago
- Fast and memory-efficient exact attention☆68Updated 3 months ago
- ☆205Updated 2 years ago
- ☆215Updated this week
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆198Updated this week
- Official repository of Sparse ISO-FLOP Transformations for Maximizing Training Efficiency☆25Updated 10 months ago
- ☆105Updated 9 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆44Updated this week
- ☆35Updated 5 months ago
- An extension of the nanoGPT repository for training small MOE models.☆147Updated 2 months ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆134Updated 9 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆108Updated 7 months ago
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆211Updated 6 months ago
- ICLR 2021☆48Updated 4 years ago
- Memory Optimizations for Deep Learning (ICML 2023)☆64Updated last year
- ☆228Updated 3 months ago