Sequence-level 1F1B schedule for LLMs.
☆38Aug 26, 2025Updated 6 months ago
Alternatives and similar repositories for Seq1F1B
Users that are interested in Seq1F1B are comparing it to the libraries listed below
Sorting:
- Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines☆19Dec 8, 2023Updated 2 years ago
- Vocabulary Parallelism☆25Mar 10, 2025Updated 11 months ago
- Distributed IO-aware Attention algorithm☆24Sep 24, 2025Updated 5 months ago
- Zero Bubble Pipeline Parallelism☆451May 7, 2025Updated 10 months ago
- Allow torch tensor memory to be released and resumed later☆220Feb 9, 2026Updated 3 weeks ago
- CUDA SGEMM optimization note☆15Oct 31, 2023Updated 2 years ago
- ☆42Sep 8, 2025Updated 5 months ago
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆92Updated this week
- ☆13Feb 22, 2023Updated 3 years ago
- A benchmark suited especially for deep learning operators☆42Feb 13, 2023Updated 3 years ago
- ☆24Aug 15, 2023Updated 2 years ago
- Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…☆90Sep 11, 2025Updated 5 months ago
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆92Jan 26, 2026Updated last month
- Canvas: End-to-End Kernel Architecture Search in Neural Networks☆27Nov 18, 2024Updated last year
- Surrogate-based Hyperparameter Tuning System☆29Jun 29, 2023Updated 2 years ago
- Ring attention implementation with flash attention☆987Sep 10, 2025Updated 5 months ago
- ☆26Dec 5, 2022Updated 3 years ago
- A lightweight design for computation-communication overlap.☆223Jan 20, 2026Updated last month
- ☆633Jan 14, 2026Updated last month
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆122Jul 4, 2025Updated 8 months ago
- Asynchronous pipeline parallel optimization☆19Feb 2, 2026Updated last month
- NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process com…☆469Feb 28, 2026Updated last week
- ☆78May 4, 2021Updated 4 years ago
- Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…☆30Oct 30, 2025Updated 4 months ago
- Prefix-Aware Attention for LLM Decoding☆29Jan 23, 2026Updated last month
- gLLM: Global Balanced Pipeline Parallelism System for Distributed LLM Serving with Token Throttling☆55Jan 12, 2026Updated last month
- Research and development for optimizing transformers☆131Feb 16, 2021Updated 5 years ago
- SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training☆36Mar 1, 2023Updated 3 years ago
- Linear Attention Sequence Parallelism (LASP)☆89Jun 4, 2024Updated last year
- ☆74Sep 15, 2025Updated 5 months ago
- ☆165Jul 22, 2024Updated last year
- ☆10Apr 24, 2024Updated last year
- Neural Network Execution Service☆11Oct 3, 2023Updated 2 years ago
- This is the code of a agentic rag method with dynamic workflow.☆12Jan 22, 2026Updated last month
- Official PyTorch Implementation of SinGRAF (CVPR2023)☆11Jun 28, 2023Updated 2 years ago
- 🚀全流程自己训练一个VLA 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!☆27Oct 16, 2025Updated 4 months ago
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- SOTA Learning-augmented Systems☆37May 21, 2022Updated 3 years ago
- Terminal tool that converts files encoding to UTF-8☆10Oct 5, 2019Updated 6 years ago