deepseek-ai / DeepSeek-V3.2-ExpLinks

☆683

Alternatives and similar repositories for DeepSeek-V3.2-Exp

Users that are interested in DeepSeek-V3.2-Exp are comparing it to the libraries listed below

Sorting:

QwenLM / ParScale
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
☆443Updated 4 months ago
MoonshotAI / checkpoint-engine
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
☆751Updated this week
ByteDance-Seed / Seed-Thinking-v1.5
☆816Updated 3 months ago
NVlabs / Fast-dLLM
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
☆482Updated 3 weeks ago
stepfun-ai / Step3
☆427Updated last month
mit-han-lab / duo-attention
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
☆491Updated 7 months ago
weigao266 / Awesome-Efficient-Arch
Speed Always Wins: A Survey on Efficient Architectures for Large Language Models
☆337Updated last month
Infini-AI-Lab / MagicPIG
[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation
☆238Updated 9 months ago
thinking-machines-lab / batch_invariant_ops
☆773Updated 3 weeks ago
hao-ai-lab / Dynasor
[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.
☆197Updated 4 months ago
ByteDance-Seed / ByteCheckpoint
ByteCheckpoint: An Unified Checkpointing Library for LFMs
☆248Updated 2 months ago
THUDM / slime
slime is an LLM post-training framework for RL Scaling.
☆2,023Updated this week
MoonshotAI / Moonlight
Muon is Scalable for LLM Training
☆1,318Updated 2 months ago
NVIDIA / Star-Attention
Efficient LLM Inference over Long Sequences
☆391Updated 3 months ago
fla-org / native-sparse-attention
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
☆877Updated 6 months ago
SkyworkAI / Skywork-OR1
Unleashing the Power of Reinforcement Learning for Math and Code Reasoners
☆723Updated 3 months ago
fxmeng / TransMLA
TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)
☆372Updated last week
microsoft / SeerAttention
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs
☆155Updated last week
yaof20 / Flash-RL
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆242Updated last week
ByteDance-Seed / seed-oss
☆816Updated 2 weeks ago
JT-Ushio / MHA2MLA
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
☆192Updated 3 months ago
Multiverse4FM / Multiverse
☆75Updated 3 months ago
sail-sg / understand-r1-zero
Understanding R1-Zero-Like Training: A Critical Perspective
☆1,100Updated last month
DreamLM / Dream
Dream 7B, a large diffusion language model
☆984Updated last week
hao-ai-lab / Consistency_LLM
[ICML 2024] CLLMs: Consistency Large Language Models
☆404Updated 10 months ago
microsoft / LongRoPE
LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.
☆260Updated last year
sgl-project / SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆412Updated this week
NVIDIA-NeMo / RL
Scalable toolkit for efficient model reinforcement
☆910Updated this week
ChenxinAn-fdu / POLARIS
Scaling RL on advanced reasoning models
☆591Updated last month
ZihanWang314 / CoE
Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models
☆220Updated 2 weeks ago