stepfun-ai / Step3Links
☆428Updated 2 months ago
Alternatives and similar repositories for Step3
Users that are interested in Step3 are comparing it to the libraries listed below
Sorting:
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"☆581Updated 2 weeks ago
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆249Updated 3 months ago
- A sparse attention kernel supporting mix sparse patterns☆322Updated 8 months ago
- A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training☆537Updated last week
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆494Updated 8 months ago
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs☆158Updated last month
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆582Updated last week
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆439Updated this week
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆242Updated 2 months ago
- VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo☆1,231Updated this week
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference☆147Updated last week
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆191Updated 3 weeks ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring☆239Updated 3 months ago
- 青稞Talk☆151Updated last week
- siiRL: Shanghai Innovation Institute RL Framework for Advanced LLMs and Multi-Agent Systems☆222Updated this week
- ☆203Updated 6 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆338Updated 3 months ago
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆261Updated 3 weeks ago
- Efficient triton implementation of Native Sparse Attention.☆238Updated 5 months ago
- A parallelism VAE avoids OOM for high resolution image generation☆81Updated 2 months ago
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆448Updated 5 months ago
- SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention☆113Updated this week
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…☆171Updated last month
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆323Updated 6 months ago
- DeepSeek Native Sparse Attention pytorch implementation☆106Updated 2 weeks ago
- FlagScale is a large model toolkit based on open-sourced projects.☆362Updated last week
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithm…☆68Updated 2 months ago
- A Unified Cache Acceleration Framework for 🤗Diffusers: Qwen-Image-Lightning, Qwen-Image, HunyuanImage, Wan, FLUX, etc.☆421Updated this week
- 16-fold memory access reduction with nearly no loss☆105Updated 7 months ago
- qwen-nsa☆79Updated last week