deepseek-ai / ESFTLinks
Expert Specialized Fine-Tuning
☆708Updated 5 months ago
Alternatives and similar repositories for ESFT
Users that are interested in ESFT are comparing it to the libraries listed below
Sorting:
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models☆1,820Updated last year
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models☆2,961Updated last year
- OLMoE: Open Mixture-of-Experts Language Models☆898Updated last month
- ☆540Updated last year
- Muon is Scalable for LLM Training☆1,348Updated 3 months ago
- An Open Large Reasoning Model for Real-World Solutions☆1,524Updated 5 months ago
- [ICML 2025 Oral] CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction☆557Updated 6 months ago
- ☆817Updated 4 months ago
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆449Updated 5 months ago
- MoBA: Mixture of Block Attention for Long-Context LLMs☆1,950Updated 7 months ago
- DeepSeek-VL: Towards Real-World Vision-Language Understanding☆4,002Updated last year
- [NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆611Updated 7 months ago
- Large Reasoning Models☆806Updated 11 months ago
- A curated list of open-source projects related to DeepSeek Coder☆720Updated last year
- Scalable toolkit for efficient model reinforcement☆977Updated last week
- ☆1,348Updated 11 months ago
- Fully open data curation for reasoning models☆2,132Updated 2 months ago
- LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.☆264Updated last week
- Dream 7B, a large diffusion language model☆1,040Updated last month
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆353Updated 10 months ago
- Analyze computation-communication overlap in V3/R1.☆1,112Updated 7 months ago
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆1,121Updated 2 weeks ago
- ☆1,331Updated last month
- ☆963Updated 9 months ago
- Expert Parallelism Load Balancer☆1,291Updated 7 months ago
- [NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…☆1,147Updated last month
- An Open Source Toolkit For LLM Distillation☆760Updated 3 months ago
- A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!☆1,160Updated 9 months ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆697Updated 3 months ago
- ☆966Updated last month