facebookresearch / MODel_optLinks

Memory Optimizations for Deep Learning (ICML 2023)

☆111

Alternatives and similar repositories for MODel_opt

Users that are interested in MODel_opt are comparing it to the libraries listed below

Sorting:

cchan / tccl
extensible collectives library in triton
☆91Updated 8 months ago
stanford-futuredata / stk
☆113Updated last year
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆226Updated last year
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆172Updated 8 months ago
meta-pytorch / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆297Updated this week
meta-pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆161Updated 2 months ago
Deep-Learning-Profiling-Tools / triton-viz
☆257Updated this week
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
☆159Updated 2 years ago
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆308Updated 3 months ago
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆402Updated 2 weeks ago
Jokeren / triton-samples
☆28Updated 10 months ago
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆216Updated 2 weeks ago
albanD / subclass_zoo
☆185Updated last year
gpu-mode / ring-attention
ring-attention experiments
☆160Updated last year
triton-lang / kernels
☆94Updated last year
pytorch / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆658Updated this week
hpcaitech / TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
☆122Updated last year
gpu-mode / triton-index
Cataloging released Triton kernels.
☆274Updated 2 months ago
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆61Updated last week
topal-team / rockmate
☆36Updated 11 months ago
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆46Updated last year
open-lm-engine / accelerated-model-architectures
A bunch of kernels that might make stuff slower 😉
☆65Updated this week
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
ezyang / torchdbg
PyTorch centric eager mode debugger
☆48Updated 11 months ago
google / aqt
☆337Updated 2 weeks ago
meta-pytorch / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆47Updated 3 months ago
wangsiping97 / FastGEMV
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
☆123Updated last year
meta-pytorch / superblock
A block oriented training approach for inference time optimization.
☆33Updated last year
tspeterkim / paged-attention-minimal
a minimal cache manager for PagedAttention, on top of llama3.
☆126Updated last year
meta-pytorch / BackendBench
Ship correct and fast LLM kernels to PyTorch
☆125Updated 3 weeks ago