deepseek-ai / DeepSeek-V3.2-ExpLinks
☆683Updated this week
Alternatives and similar repositories for DeepSeek-V3.2-Exp
Users that are interested in DeepSeek-V3.2-Exp are comparing it to the libraries listed below
Sorting:
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆443Updated 4 months ago
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆751Updated this week
- ☆816Updated 3 months ago
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"☆482Updated 3 weeks ago
- ☆427Updated last month
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆491Updated 7 months ago
- Speed Always Wins: A Survey on Efficient Architectures for Large Language Models☆337Updated last month
- [ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation☆238Updated 9 months ago
- ☆773Updated 3 weeks ago
- [NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.☆197Updated 4 months ago
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆248Updated 2 months ago
- slime is an LLM post-training framework for RL Scaling.☆2,023Updated this week
- Muon is Scalable for LLM Training☆1,318Updated 2 months ago
- Efficient LLM Inference over Long Sequences☆391Updated 3 months ago
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆877Updated 6 months ago
- Unleashing the Power of Reinforcement Learning for Math and Code Reasoners☆723Updated 3 months ago
- TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)☆372Updated last week
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs☆155Updated last week
- Implementation for FP8/INT8 Rollout for RL training without performence drop.☆242Updated last week
- ☆816Updated 2 weeks ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆192Updated 3 months ago
- ☆75Updated 3 months ago
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,100Updated last month
- Dream 7B, a large diffusion language model☆984Updated last week
- [ICML 2024] CLLMs: Consistency Large Language Models