Fast, memory-efficient attention column reduction (e.g., sum, mean, max)
β42Feb 10, 2026Updated last month
Alternatives and similar repositories for flash-colreduce
Users that are interested in flash-colreduce are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NAACL 2025π₯] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inferenceβ18Jun 19, 2025Updated 9 months ago
- Code for the paper βFour Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scalingββ140Mar 7, 2026Updated 2 weeks ago
- β20Nov 21, 2025Updated 4 months ago
- [EMNLP 2025 Main] SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruningβ34Jan 11, 2026Updated 2 months ago
- An agent for CUDA compute-communication kernel co-designβ32Mar 11, 2026Updated last week
- A record of reading list on some MLsys popular topicβ23Mar 20, 2025Updated last year
- A repo for publishing solution to 3DCoMPaT++ challenge on an improved large-scale 3D vision dataset for compositional recognitionβ14Jun 22, 2023Updated 2 years ago
- [ECCV 2024] SparseRefine: Sparse Refinement for Efficient High-Resolution Semantic Segmentationβ14Jan 10, 2025Updated last year
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafterβ156Feb 27, 2026Updated 3 weeks ago
- [ICLR 2026 Oral] Locality-aware Parallel Decoding for Efficient Autoregressive Image Generationβ92Mar 12, 2026Updated last week
- [AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Modelsβ39Jan 27, 2026Updated last month
- A fast and powerful CLI tool for finding secrets and other data in files, web pages, and other text sources. Supports multi-threading andβ¦β21Updated this week
- [ICCV 2025] EA-ViT: Efficient Adaptation for Elastic Vision Transformerβ27Jul 28, 2025Updated 7 months ago
- [WACV 2025] Official code release for Transientangelo: Few-Viewpoint Surface Reconstruction Using Single-Photon Lidarβ20Oct 29, 2024Updated last year
- β24Jan 29, 2026Updated last month
- [ICLR 2026 Oral] FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Mergingβ44Mar 16, 2026Updated last week
- A Compiler from "Mx* language" (A C++ & Java like language) to RV32I Assembly, with optimizations on LLVM IR. SJTU CS2966 Project.β12Feb 12, 2023Updated 3 years ago
- β18Apr 8, 2025Updated 11 months ago
- [ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsityβ71Mar 10, 2026Updated last week
- SJTU CS2951 Computer Architecture Course Project, A Verilog HDL implemented RISC-V CPU.β11Jan 15, 2022Updated 4 years ago
- [ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Modelsβ69May 15, 2025Updated 10 months ago
- APRIL: Active Partial Rollouts in Reinforcement Learning to Tame Long-tail Generation. A system-level optimization for scalable LLM traβ¦β54Oct 11, 2025Updated 5 months ago
- β14Sep 11, 2025Updated 6 months ago
- [ICLR 2024] The Need for Speed: Pruning Transformers with One Recipeβ30Sep 2, 2024Updated last year
- [NeurIPS 2025] FastVID: Dynamic Density Pruning for Fast Video Large Language Modelsβ31Nov 10, 2025Updated 4 months ago
- My Non-Python Utility Functionsβ16May 3, 2019Updated 6 years ago
- β35Jun 3, 2025Updated 9 months ago
- β13Jul 3, 2024Updated last year
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Foldersβ25Feb 21, 2025Updated last year
- Project Page for GaussianFormerβ24May 30, 2024Updated last year
- Course Projects for Stanford CS142 Web Applicationsβ10Oct 15, 2016Updated 9 years ago
- β13May 15, 2025Updated 10 months ago
- DFlash: Block Diffusion for Flash Speculative Decodingβ652Updated this week
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cacheβ¦β200Nov 17, 2025Updated 4 months ago
- β23Nov 16, 2024Updated last year
- Repository for answers for exercises in Programming Massively Parallel Processors bookβ16Aug 10, 2024Updated last year
- Extending context length of visual language modelsβ12Dec 18, 2024Updated last year
- A few shell scripts which shows you some fun facts about your Facebook profile.β17Oct 21, 2012Updated 13 years ago
- Prioritize Alignment in Dataset Distillationβ21Dec 3, 2024Updated last year