☆21Mar 3, 2025Updated last year
Alternatives and similar repositories for no-libtorch-compile
Users that are interested in no-libtorch-compile are comparing it to the libraries listed below
Sorting:
- extensible collectives library in triton☆96Mar 31, 2025Updated 11 months ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- See https://github.com/cuda-mode/triton-index/ instead!☆11May 8, 2024Updated last year
- Collection of scripts to build PyTorch and the domain libraries from source.☆13Feb 4, 2026Updated last month
- ☆19Dec 4, 2025Updated 3 months ago
- TORCH_TRACE parser for PT2☆78Feb 26, 2026Updated last week
- Prototype routines for GPU quantization written using PyTorch.☆21Feb 8, 2026Updated last month
- Utilities for PyTorch distributed☆25Feb 27, 2025Updated last year
- ☆13Jun 18, 2024Updated last year
- ☆12Aug 26, 2025Updated 6 months ago
- Experimental GPU language with meta-programming☆26Sep 6, 2024Updated last year
- PyTorch centric eager mode debugger☆48Dec 16, 2024Updated last year
- ☆12Jan 4, 2024Updated 2 years ago
- Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code☆10Aug 29, 2023Updated 2 years ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆49Aug 18, 2025Updated 6 months ago
- Personal solutions to the Triton Puzzles☆20Jul 18, 2024Updated last year
- Conformer block with Rotary Position Embedding, modified from lucidrains' implement☆18Sep 13, 2024Updated last year
- Ultra low overhead NVIDIA GPU telemetry plugin for telegraf with memory temperature readings.☆63Jul 8, 2024Updated last year
- Blazing fast data loading with HuggingFace Dataset and Ray Data☆16Jan 12, 2024Updated 2 years ago
- ☆24Dec 11, 2024Updated last year
- SMT-LIB benchmarks for shape computations from deep learning models in PyTorch☆18Dec 21, 2022Updated 3 years ago
- Learning to Model Editing Processes☆26Aug 3, 2025Updated 7 months ago
- An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.☆52Updated this week
- URL downloader supporting checkpointing and continuous checksumming.☆19Nov 29, 2023Updated 2 years ago
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆481Updated this week
- Torch Distributed Experimental☆117Aug 5, 2024Updated last year
- ☆20Nov 23, 2022Updated 3 years ago
- ☆20Jul 12, 2023Updated 2 years ago
- DL Dataloader Benchmarks☆20Jan 27, 2025Updated last year
- GeoT: Tensor Centric Library for Graph Neural Network via Efficient Segment Reduction on GPU☆24Mar 27, 2025Updated 11 months ago
- An implementation of the Llama architecture, to instruct and delight☆21May 31, 2025Updated 9 months ago
- ☆51Jan 28, 2024Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆20Oct 23, 2023Updated 2 years ago
- ☆19Mar 22, 2024Updated last year
- Utilities for Training Very Large Models☆58Sep 25, 2024Updated last year
- ☆301Updated this week
- ☆23Jun 18, 2024Updated last year
- Benchmark code for the "Online normalizer calculation for softmax" paper☆108Jul 27, 2018Updated 7 years ago
- Scalable and Performant Data Loading☆368Feb 26, 2026Updated last week