genggui001 / Megatron-DeepSpeed-LlamaView external linksLinks
☆84Sep 9, 2023Updated 2 years ago
Alternatives and similar repositories for Megatron-DeepSpeed-Llama
Users that are interested in Megatron-DeepSpeed-Llama are comparing it to the libraries listed below
Sorting:
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆19Jul 20, 2023Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆69Jul 20, 2023Updated 2 years ago
- A LLaMA1/LLaMA12 Megatron implement.☆28Dec 13, 2023Updated 2 years ago
- NTK scaled version of ALiBi position encoding in Transformer.☆69Aug 16, 2023Updated 2 years ago
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆224Nov 21, 2023Updated 2 years ago
- Best practice for training LLaMA models in Megatron-LM☆664Jan 2, 2024Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,433Mar 20, 2024Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆2,224Aug 14, 2025Updated 6 months ago
- Towards Systematic Measurement for Long Text Quality☆37Sep 5, 2024Updated last year
- The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.☆1,527Dec 15, 2025Updated 2 months ago
- distributed trainer for LLMs☆588May 20, 2024Updated last year
- Finetuning LLaMA with DeepSpeed☆10Apr 14, 2023Updated 2 years ago
- ☆43Dec 15, 2023Updated 2 years ago
- 百川Dynamic NTK-ALiBi的代码实现:无需微调即可推理更长文本☆49Aug 27, 2023Updated 2 years ago
- Built upon Megatron-Deepspeed and HuggingFace Trainer, EasyLLM has reorganized the code logic with a focus on usability. While enhancing …☆49Sep 18, 2024Updated last year
- ☆12Nov 10, 2023Updated 2 years ago
- 本项目采用BERT等预训练模型实现多项选择型阅读理解任务(Multiple Choice MRC)☆16Jun 20, 2021Updated 4 years ago
- Implementation of Chinese ChatGPT☆288Nov 20, 2023Updated 2 years ago
- [AAAI 2026] SIFThinker: Spatially-Aware Image Focus for Visual Reasoning☆22Dec 2, 2025Updated 2 months ago
- 基于Bart语言模型的指针生成网络,用于中文语法纠错任务☆16Sep 8, 2022Updated 3 years ago
- Collaborative Training of Large Language Models in an Efficient Way☆419Aug 28, 2024Updated last year
- Tencent Pre-training framework in PyTorch & Pre-trained Model Zoo☆1,089Aug 4, 2024Updated last year
- ☆43Jan 21, 2025Updated last year
- Silk Road will be the dataset zoo for Luotuo(骆驼). Luotuo is an open sourced Chinese-LLM project founded by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子…☆40Nov 5, 2023Updated 2 years ago
- ☆19May 11, 2024Updated last year
- ☆16Mar 30, 2024Updated last year
- code for Scaling Laws of RoPE-based Extrapolation☆73Oct 16, 2023Updated 2 years ago
- Transformer related optimization, including BERT, GPT☆17Jul 29, 2023Updated 2 years ago
- ☆21Sep 12, 2023Updated 2 years ago
- Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集☆3,055Apr 14, 2024Updated last year
- Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.☆1,008Jul 29, 2024Updated last year
- PULSE: Pretrained and Unified Language Service Engine☆494Dec 26, 2023Updated 2 years ago
- Codes for NAACL 2021 paper 'Noisy Self-Knowledge Distillation for Text Summarization'☆24Jul 27, 2021Updated 4 years ago
- smp2018用户画像技术评测☆21Jul 17, 2018Updated 7 years ago
- 纯c++的全平台llm加速库,支持python调用,支持baichuan, glm, llama, moss基座,手机端流畅运行chatglm-6B级模型单卡可达10000+token / s,☆43Aug 16, 2023Updated 2 years ago
- ☆23Mar 31, 2023Updated 2 years ago
- Code for M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models☆23Jul 27, 2024Updated last year
- 怎么训练一个LLM分词器☆153Jul 13, 2023Updated 2 years ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆23Mar 15, 2024Updated last year