lianakoleva / no-libtorch-compileView external linksLinks
☆21Mar 3, 2025Updated 11 months ago
Alternatives and similar repositories for no-libtorch-compile
Users that are interested in no-libtorch-compile are comparing it to the libraries listed below
Sorting:
- extensible collectives library in triton☆95Mar 31, 2025Updated 10 months ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- See https://github.com/cuda-mode/triton-index/ instead!☆11May 8, 2024Updated last year
- ☆19Dec 4, 2025Updated 2 months ago
- TORCH_TRACE parser for PT2☆78Updated this week
- Prototype routines for GPU quantization written using PyTorch.☆21Feb 8, 2026Updated last week
- Utilities for PyTorch distributed☆25Feb 27, 2025Updated 11 months ago
- ☆13Jun 18, 2024Updated last year
- Cuda extensions for PyTorch☆12Dec 2, 2025Updated 2 months ago
- ☆12Aug 26, 2025Updated 5 months ago
- Experimental GPU language with meta-programming☆25Sep 6, 2024Updated last year
- PyTorch centric eager mode debugger☆48Dec 16, 2024Updated last year
- ☆12Jan 4, 2024Updated 2 years ago
- Simple (fast) transformer inference in PyTorch with torch.compile + lit-llama code☆10Aug 29, 2023Updated 2 years ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆49Aug 18, 2025Updated 5 months ago
- Explorations into the proposed SDFT, Self-Distillation Enables Continual Learning, from Shenfeld et al. of MIT☆29Feb 6, 2026Updated last week
- Conformer block with Rotary Position Embedding, modified from lucidrains' implement☆16Sep 13, 2024Updated last year
- Personal solutions to the Triton Puzzles☆20Jul 18, 2024Updated last year
- Source-to-Source Debuggable Derivatives in Pure Python☆15Jan 23, 2024Updated 2 years ago
- Ultra low overhead NVIDIA GPU telemetry plugin for telegraf with memory temperature readings.☆63Jul 8, 2024Updated last year
- SMT-LIB benchmarks for shape computations from deep learning models in PyTorch☆18Dec 21, 2022Updated 3 years ago
- Blazing fast data loading with HuggingFace Dataset and Ray Data☆16Jan 12, 2024Updated 2 years ago
- ☆24Dec 11, 2024Updated last year
- Learning to Model Editing Processes☆26Aug 3, 2025Updated 6 months ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆46May 23, 2023Updated 2 years ago
- URL downloader supporting checkpointing and continuous checksumming.☆19Nov 29, 2023Updated 2 years ago
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆476Feb 3, 2026Updated last week
- Torch Distributed Experimental☆117Aug 5, 2024Updated last year
- ☆20Jul 12, 2023Updated 2 years ago
- DL Dataloader Benchmarks☆20Jan 27, 2025Updated last year
- An implementation of the Llama architecture, to instruct and delight☆21May 31, 2025Updated 8 months ago
- ☆20Nov 23, 2022Updated 3 years ago
- ☆51Jan 28, 2024Updated 2 years ago
- ☆19Mar 22, 2024Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆20Oct 23, 2023Updated 2 years ago
- ☆288Updated this week
- Trying to build an all in one speech-text language model - a bit like GPT-4o☆22Jun 1, 2024Updated last year
- ☆23Jun 18, 2024Updated last year
- Scalable and Performant Data Loading☆366Feb 4, 2026Updated last week