google-research-datasets / tpu_graphsLinks

☆125

Alternatives and similar repositories for tpu_graphs

Users that are interested in tpu_graphs are comparing it to the libraries listed below

Sorting:

facebookresearch / MODel_opt
Memory Optimizations for Deep Learning (ICML 2023)
☆102Updated last year
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆142Updated 4 months ago
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆207Updated this week
graphcore / tutorials
Training material for IPU users: tutorials, feature examples, simple applications
☆86Updated 2 years ago
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆48Updated last week
stanford-futuredata / stk
☆107Updated 11 months ago
llm-efficiency-challenge / neurips_llm_efficiency_challenge
NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day
☆256Updated last year
pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆158Updated last month
microsoft / varuna
☆251Updated last year
insuhan / hyper-attn
☆81Updated last year
google / aqt
☆323Updated last week
gpu-mode / ring-attention
ring-attention experiments
☆146Updated 9 months ago
siboehm / ShallowSpeed
Small scale distributed training of sequential deep learning models, built on Numpy and MPI.
☆137Updated last year
gpu-mode / triton-index
Cataloging released Triton kernels.
☆247Updated 6 months ago
Jokeren / triton-samples
☆28Updated 6 months ago
ambisinister / mla-experiments
Experiments on Multi-Head Latent Attention
☆94Updated 11 months ago
awslabs / slapo
A schedule language for large model training
☆149Updated last year
Ying1123 / awesome-neural-symbolic
A list of awesome neural symbolic papers.
☆47Updated 3 years ago
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆230Updated 8 months ago
efeslab / fiddler
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
☆225Updated 8 months ago
pytorch-labs / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆224Updated last year
CerebrasResearch / Sparse-IFT
Official repository of Sparse ISO-FLOP Transformations for Maximizing Training Efficiency
☆25Updated last year
Deep-Learning-Profiling-Tools / triton-viz
☆227Updated this week
tspeterkim / paged-attention-minimal
a minimal cache manager for PagedAttention, on top of llama3.
☆110Updated 11 months ago
CentML / DeepView.Profile
🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.
☆64Updated 6 months ago
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆273Updated 3 years ago
axonn-ai / axonn
A parallel framework for training deep neural networks
☆63Updated 4 months ago
mengxiayu / LLMSuperWeight
Code for studying the super weight in LLM
☆114Updated 8 months ago
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆69Updated 5 months ago
mobiusml / gemlite
Fast low-bit matmul kernels in Triton
☆339Updated this week