deepseek-ai / ESFT
Expert Specialized Fine-Tuning
☆392Updated 4 months ago
Alternatives and similar repositories for ESFT:
Users that are interested in ESFT are comparing it to the libraries listed below
- DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models☆1,187Updated last year
- OLMoE: Open Mixture-of-Experts Language Models☆539Updated last month
- DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models☆1,761Updated 9 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆291Updated last month
- ☆326Updated 5 months ago
- Search-o1: Agentic Search-Enhanced Large Reasoning Models☆515Updated this week
- Code for Quiet-STaR☆706Updated 5 months ago
- [ACL 2024] Progressive LLaMA with Block Expansion.☆496Updated 8 months ago
- Evaluation suite for LLMs☆330Updated last month
- Arena-Hard-Auto: An automatic LLM benchmark.☆718Updated last month
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆695Updated 4 months ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆292Updated this week
- A curated list of open-source projects related to DeepSeek Coder☆425Updated 9 months ago
- ☆301Updated 4 months ago
- The official evaluation suite and dynamic data release for MixEval.☆233Updated 2 months ago
- LiveBench: A Challenging, Contamination-Free LLM Benchmark☆433Updated this week
- Recipes to scale inference-time compute of open models☆975Updated 2 weeks ago
- ☆489Updated 2 months ago
- ☆868Updated last week
- AN O1 REPLICATION FOR CODING☆311Updated last month
- An Open Source Toolkit For LLM Distillation☆442Updated 3 weeks ago
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RL☆282Updated last week
- [NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which r…☆891Updated last week
- Large Reasoning Models☆801Updated last month
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆125Updated 6 months ago
- [ICML 2024] CLLMs: Consistency Large Language Models☆368Updated 2 months ago
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆450Updated 10 months ago
- [NeurIPS'24] SelfCodeAlign: Self-Alignment for Code Generation☆293Updated 2 months ago
- Reaching LLaMA2 Performance with 0.1M Dollars☆967Updated 6 months ago