Efficient Long-context Language Model Training by Core Attention Disaggregation
☆91Feb 23, 2026Updated last week
Alternatives and similar repositories for DistCA
Users that are interested in DistCA are comparing it to the libraries listed below
Sorting:
- A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and cach…☆58Oct 27, 2025Updated 4 months ago
- DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit☆92Jan 26, 2026Updated last month
- Vortex: A Flexible and Efficient Sparse Attention Framework☆48Jan 21, 2026Updated last month
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆33Nov 29, 2024Updated last year
- Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]☆41May 13, 2025Updated 9 months ago
- Deferred Continuous Batching in Resource-Efficient Large Language Model Serving (EuroMLSys 2024)☆19May 28, 2024Updated last year
- [NeurIPS 2024] Efficient LLM Scheduling by Learning to Rank☆71Nov 4, 2024Updated last year
- APEX+ is an LLM Serving Simulator☆42Jun 16, 2025Updated 8 months ago
- Prefix-Aware Attention for LLM Decoding☆29Jan 23, 2026Updated last month
- Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"☆77Oct 15, 2025Updated 4 months ago
- ☆65Apr 26, 2025Updated 10 months ago
- [ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning☆10Apr 28, 2023Updated 2 years ago
- Triton-based Symmetric Memory operators and examples☆86Jan 15, 2026Updated last month
- d3LLM: Ultra-Fast Diffusion LLM 🚀☆93Feb 4, 2026Updated last month
- ☆226Nov 19, 2025Updated 3 months ago
- ☆118May 19, 2025Updated 9 months ago
- How to plot for papers, slides, demos, etc.☆10Apr 7, 2022Updated 3 years ago
- a simple API to use CUPTI☆11Aug 19, 2025Updated 6 months ago
- Notes for the book Fluent Python, 1st Edition (O'Reilly, 2015)☆11Jun 30, 2022Updated 3 years ago
- APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM tra…☆51Oct 11, 2025Updated 4 months ago
- ☆150Oct 9, 2024Updated last year
- An experimental communicating attention kernel based on DeepEP.☆35Jul 29, 2025Updated 7 months ago
- High-performance distributed data shuffling (all-to-all) library for MoE training and inference☆112Updated this week
- ☆25Oct 11, 2025Updated 4 months ago
- Expert Specialization MoE Solution based on CUTLASS☆27Jan 19, 2026Updated last month
- ☆15Feb 24, 2026Updated last week
- PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)☆30Jun 14, 2024Updated last year
- ☆52May 19, 2025Updated 9 months ago
- ☆19Jun 1, 2025Updated 9 months ago
- LoRAFusion: Efficient LoRA Fine-Tuning for LLMs☆24Sep 23, 2025Updated 5 months ago
- A fast text search engine built for SSDs, written in C++.☆11Aug 29, 2022Updated 3 years ago
- extensible collectives library in triton☆95Mar 31, 2025Updated 11 months ago
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter☆138Dec 5, 2025Updated 2 months ago
- Implementation from scratch in C of the Multi-head latent attention used in the Deepseek-v3 technical paper.☆18Jan 15, 2025Updated last year
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆163Feb 11, 2026Updated 3 weeks ago
- Perplexity open source garden for inference technology☆371Dec 25, 2025Updated 2 months ago
- Sequence-level 1F1B schedule for LLMs.☆38Aug 26, 2025Updated 6 months ago
- A lightweight design for computation-communication overlap.☆223Jan 20, 2026Updated last month
- [Archived] For the latest updates and community contribution, please visit: https://github.com/Ascend/TransferQueue or https://gitcode.co…☆13Jan 16, 2026Updated last month