z-lab/flash-colreduce

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/z-lab/flash-colreduce)

z-lab / flash-colreduce

Fast, memory-efficient attention column reduction (e.g., sum, mean, max)

☆37

Alternatives and similar repositories for flash-colreduce

Users that are interested in flash-colreduce are comparing it to the libraries listed below

Sorting:

bscho333 / ReVisiT
View on GitHub
☆20Nov 21, 2025Updated 3 months ago
AIoT-MLSys-Lab / MEDA
View on GitHub
[NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
☆17Jun 19, 2025Updated 8 months ago
yichuan-w / MLsys_reading_list
View on GitHub
A record of reading list on some MLsys popular topic
☆22Mar 20, 2025Updated 11 months ago
zju-jiyicheng / SpecVLM
View on GitHub
[EMNLP 2025 Main] SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning
☆34Jan 11, 2026Updated last month
mit-han-lab / fastrl
View on GitHub
[ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
☆138Dec 5, 2025Updated 2 months ago
mit-han-lab / fouroversix
View on GitHub
Code for the paper “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling”
☆130Updated this week
RLsys-Foundation / APRIL
View on GitHub
APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM tra…
☆51Oct 11, 2025Updated 4 months ago
PluralisResearch / AsyncPP
View on GitHub
Asynchronous pipeline parallel optimization
☆19Feb 2, 2026Updated last month
xuyang-liu16 / GlobalCom2
View on GitHub
[AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
☆38Jan 27, 2026Updated last month
GradientHQ / symphony
View on GitHub
Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…
☆30Oct 30, 2025Updated 4 months ago
Cattalyya / 3DCoMPaT-challenge
View on GitHub
A repo for publishing solution to 3DCoMPaT++ challenge on an improved large-scale 3D vision dataset for compositional recognition
☆14Jun 22, 2023Updated 2 years ago
flashserve / PAT
View on GitHub
Prefix-Aware Attention for LLM Decoding
☆29Jan 23, 2026Updated last month
z-lab / sparselora
View on GitHub
[ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
☆71Jul 5, 2025Updated 7 months ago
SJTU-IPADS / Wukong-S
View on GitHub
A distributed stream querying engine that provides sub-millisecond stateful query at millions of queries per-second over fast-evolving li…
☆10Jul 18, 2018Updated 7 years ago
sinhutt / MEAN
View on GitHub
Course Projects for Stanford CS142 Web Applications
☆10Oct 15, 2016Updated 9 years ago
chenyiqun / Agentic-RAG
View on GitHub
This is the code of a agentic rag method with dynamic workflow.
☆12Jan 22, 2026Updated last month
tanvir-utexas / PaPr
View on GitHub
☆13Jul 3, 2024Updated last year
t123yh / MIPSCPU
View on GitHub
A simple MIPS CPU for BUAA CO course (and now NSCSCC).
☆10May 15, 2021Updated 4 years ago
ali-k-hesar / how-AI-Sees-Our-World
View on GitHub
Vision Transformer (ViT) models, with their attention mechanisms, revolutionized computer vision. By merging Class Activation Map (CAM) a…
☆13Aug 14, 2023Updated 2 years ago
lzhangbv / acpsgd
View on GitHub
[ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning
☆10Apr 28, 2023Updated 2 years ago
G-JWLee / TAMP
View on GitHub
☆13May 15, 2025Updated 9 months ago
xizaoqu / blender_for_UniHSI
View on GitHub
☆12Mar 5, 2024Updated last year
yutxie / cpu-riscv
View on GitHub
ACM Class 2017 Computer Architecture
☆10Jan 11, 2018Updated 8 years ago
yqm-307 / bbtools-rpc
View on GitHub
boost context 自实现协程和调度器。构建rpc框架
☆10May 9, 2025Updated 9 months ago
wu-kan / wuk_cupti_wrapper
View on GitHub
a simple API to use CUPTI
☆11Aug 19, 2025Updated 6 months ago
hyungjin-chung / VPS
View on GitHub
☆14Sep 11, 2025Updated 5 months ago
GuangtaoLyu / PSSTRNet
View on GitHub
☆13Jul 28, 2024Updated last year
hydro-project / cidr2021
View on GitHub
paper and code for New Directions in Cloud Programming, CIDR 2021
☆11Feb 17, 2021Updated 5 years ago
mapleFU / mwish-leveldb-notes
View on GitHub
My notes for reading leveldb
☆11Apr 19, 2024Updated last year
mit-han-lab / lpd
View on GitHub
[ICLR 2026 Oral] Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation
☆87Feb 7, 2026Updated 3 weeks ago
parallel101 / hw08
View on GitHub
☆11Sep 12, 2023Updated 2 years ago
Nov11 / boltdb_in_cpp
View on GitHub
read source code of boltdb & re-implement it in c++
☆12Jun 2, 2018Updated 7 years ago
lzk901372 / MM-When2Speak
View on GitHub
☆14May 20, 2025Updated 9 months ago
li1553770945 / PAL_OperatingSystem
View on GitHub
A simple OperatingSystem
☆10Sep 9, 2022Updated 3 years ago
WolodjaZ / MSAE
View on GitHub
Interpreting CLIP with Hierarchical Sparse Autoencoders (ICML 2025)
☆20Jan 17, 2026Updated last month
Xiangyue-Zhang / EchoMask
View on GitHub
[🔥ACM MM2025] EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation
☆23Dec 30, 2025Updated 2 months ago
apple / dmel-demo
View on GitHub
dMel: Speech Tokenization Made Simple
☆16May 13, 2025Updated 9 months ago
SiriusNEO / NightWizard
View on GitHub
SJTU CS2951 Computer Architecture Course Project, A Verilog HDL implemented RISC-V CPU.
☆10Jan 15, 2022Updated 4 years ago
HKUST-SAIL / NoiseAR
View on GitHub
Official implementation of "NoiseAR: AutoRegressing Initial Noise Prior for Diffusion Models"
☆18Jun 3, 2025Updated 9 months ago