A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and caching, etc.
☆61Oct 27, 2025Updated 7 months ago
Alternatives and similar repositories for Awesome-Video-Attention
Users that are interested in Awesome-Video-Attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆105Apr 7, 2026Updated 2 months ago
- [NeurIPS 2025] Scaling Speculative Decoding with Lookahead Reasoning☆69Oct 31, 2025Updated 7 months ago
- High performance inference engine for diffusion models☆108Sep 5, 2025Updated 9 months ago
- Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learning☆64Dec 18, 2025Updated 5 months ago
- ☆91Oct 17, 2025Updated 7 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [ICCV 23] A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection☆13Apr 12, 2024Updated 2 years ago
- ☆20May 30, 2024Updated 2 years ago
- ☆173Updated this week
- Pygloo provides Python bindings for Gloo.☆22Jul 7, 2025Updated 11 months ago
- Fabric Stain Detection System based on YOLO algorithm☆20Jan 28, 2025Updated last year
- [Findings of EMNLP 2024] AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models☆20Oct 2, 2024Updated last year
- FlashSampling: Fast and Memory-Efficient Exact Sampling (https://huggingface.co/papers/2603.15854)☆73May 13, 2026Updated 3 weeks ago
- ☆101May 10, 2026Updated 3 weeks ago
- [ACL 2023] Contextual Distortion Reveals Constituency: Mask Language Models are Implicit Parsers.☆14Jun 3, 2023Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching☆427Jul 5, 2025Updated 11 months ago
- See https://github.com/cuda-mode/triton-index/ instead!☆11May 8, 2024Updated 2 years ago
- DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing (WACV 2025)☆13Feb 7, 2026Updated 4 months ago
- Distributed parallel 3D-Causal-VAE for efficient training and inference☆47Aug 20, 2025Updated 9 months ago
- Terminal UI for NVIDIA Nsight Systems profiles — timeline viewer, kernel navigator, NVTX hierarchy☆58Updated this week
- Expert Specialization MoE Solution based on CUTLASS☆27Apr 14, 2026Updated last month
- Pytorch--使用伪标签训练efficientNet模型☆11Dec 28, 2019Updated 6 years ago
- ☆13Jan 21, 2024Updated 2 years ago
- ☆49May 16, 2026Updated 3 weeks ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- ☆18Apr 21, 2024Updated 2 years ago
- ☆13Sep 7, 2024Updated last year
- ☆16Aug 7, 2024Updated last year
- Toolchain built around the Megatron-LM for Distributed Training☆95May 20, 2026Updated 2 weeks ago
- High Performance KV Cache Store for LLM☆56May 20, 2026Updated 2 weeks ago
- Official Implementation for [ICLR26] DefensiveKV: Taming the Fragility of KV Cache Eviction in LLM Inference☆44Mar 28, 2026Updated 2 months ago
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆192Feb 11, 2026Updated 3 months ago
- See vLLM official support: https://github.com/vllm-project/vllm-ascend☆11Feb 5, 2025Updated last year
- a simple API to use CUPTI☆10Aug 19, 2025Updated 9 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Open-source toolkit for training, Priming, and serving next generation Hybrid architectures☆71May 9, 2026Updated last month
- DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling☆24Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆13Updated this week
- Advanced Scalable Systems for X☆86Apr 15, 2026Updated last month
- ☆254Jan 2, 2025Updated last year
- [Findings of EMNLP 2022] Code of paper Generative Prompt Tuning for Relation Classification. https://arxiv.org/abs/2210.12435☆20May 7, 2023Updated 3 years ago
- The aim of this project is to develop a model capable of detecting fabric defection.☆11Dec 13, 2023Updated 2 years ago