UofT-EcoSystem / TempoLinks
Memory footprint reduction for transformer models
☆11Updated 2 years ago
Alternatives and similar repositories for Tempo
Users that are interested in Tempo are comparing it to the libraries listed below
Sorting:
- Distributed MoE in a Single Kernel [NeurIPS '25]☆174Updated this week
- ☆77Updated 4 years ago
- ☆43Updated 3 years ago
- PyTorch bindings for CUTLASS grouped GEMM.☆139Updated 7 months ago
- 16-fold memory access reduction with nearly no loss☆109Updated 9 months ago
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆43Updated 3 years ago
- ☆115Updated last year
- Autonomous GPU Kernel Generation via Deep Agents☆202Updated this week
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆50Updated 5 months ago
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆51Updated 6 months ago
- pytorch-profiler☆50Updated 2 years ago
- Building the Virtuous Cycle for AI-driven LLM Systems☆112Updated this week
- ☆164Updated last year
- DeeperGEMM: crazy optimized version☆74Updated 8 months ago
- Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.☆69Updated 9 months ago
- Quantized Attention on GPU☆44Updated last year
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆24Updated 7 months ago
- ☆45Updated 2 years ago
- Official implementation of ICML 2024 paper "ExCP: Extreme LLM Checkpoint Compression via Weight-Momentum Joint Shrinking".☆47Updated last year
- Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Award, Honorable Mention]☆48Updated 10 months ago
- Accelerating Large-Scale Reasoning Model Inference with Sparse Self-Speculative Decoding☆74Updated last month
- ☆39Updated 3 weeks ago
- ☆39Updated 5 months ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆75Updated 2 weeks ago
- Python package for rematerialization-aware gradient checkpointing☆27Updated 2 years ago
- Odysseus: Playground of LLM Sequence Parallelism☆79Updated last year
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆220Updated last year
- Vortex: A Flexible and Efficient Sparse Attention Framework☆43Updated last month
- ☆78Updated this week
- A resilient distributed training framework☆96Updated last year