☆84Sep 9, 2023Updated 2 years ago
Alternatives and similar repositories for Megatron-DeepSpeed-Llama
Users that are interested in Megatron-DeepSpeed-Llama are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆19Jul 20, 2023Updated 2 years ago
- A LLaMA1/LLaMA12 Megatron implement.☆28Dec 13, 2023Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆69Jul 20, 2023Updated 2 years ago
- NTK scaled version of ALiBi position encoding in Transformer.☆69Aug 16, 2023Updated 2 years ago
- Best practice for training LLaMA models in Megatron-LM☆664Jan 2, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆224Nov 21, 2023Updated 2 years ago
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆97Feb 5, 2024Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,438Mar 20, 2024Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆2,247Aug 14, 2025Updated 8 months ago
- distributed trainer for LLMs☆589May 20, 2024Updated last year
- The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.☆1,561Dec 15, 2025Updated 4 months ago
- Towards Systematic Measurement for Long Text Quality☆38Sep 5, 2024Updated last year
- Finetuning LLaMA with DeepSpeed☆10Apr 14, 2023Updated 3 years ago
- ☆43Dec 15, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Apply the Circular to the Pretraining Model☆38Apr 25, 2022Updated 4 years ago
- PULSE: Pretrained and Unified Language Service Engine☆496Dec 26, 2023Updated 2 years ago
- Implementation of Chinese ChatGPT☆287Nov 20, 2023Updated 2 years ago
- 百川Dynamic NTK-ALiBi的代码实现:无需微调即可推理更长文本☆49Aug 27, 2023Updated 2 years ago
- Code for "Improving Translation Faithfulness of Large Language Models via Augmenting Instructions"☆12Aug 26, 2023Updated 2 years ago
- code for Scaling Laws of RoPE-based Extrapolation☆73Oct 16, 2023Updated 2 years ago
- Silk Road will be the dataset zoo for Luotuo(骆驼). Luotuo is an open sourced Chinese-LLM project founded by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子…☆40Nov 5, 2023Updated 2 years ago
- llama inference for tencentpretrain☆99Jun 8, 2023Updated 2 years ago
- Collaborative Training of Large Language Models in an Efficient Way☆420Aug 28, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [MICCAI'25] LesionDiffusion, a general, text-controllable lesion synthesis foundation model for 3D CT imaging.☆20Jan 19, 2026Updated 3 months ago
- Tencent Pre-training framework in PyTorch & Pre-trained Model Zoo☆1,087Aug 4, 2024Updated last year
- Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.☆1,012Jul 29, 2024Updated last year
- ☆12Nov 10, 2023Updated 2 years ago
- ChatGLM-6B 指令学习|指令数据|Instruct☆652Apr 10, 2023Updated 3 years ago
- ☆20Feb 4, 2021Updated 5 years ago
- ☆26Jun 5, 2023Updated 2 years ago
- The official evaluation suite and dynamic data release for MixEval.☆11Sep 23, 2024Updated last year
- Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集☆3,050Apr 14, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆666Jan 15, 2026Updated 3 months ago
- ☆21Oct 13, 2021Updated 4 years ago
- This repository contains codes for *Sem 2023 paper “Generative Data Augmentation for Aspect Sentiment Quad Prediction”.☆11May 30, 2023Updated 2 years ago
- 基于PyTorch GPT-2的针对各种数据并行pretrain的研究代码.☆11Dec 16, 2022Updated 3 years ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆40Jan 4, 2024Updated 2 years ago
- Code for EMNLP 2021 paper "CLIFF: Contrastive Learning for Improving Faithfulness and Factuality in Abstractive Summarization"☆47Jan 17, 2022Updated 4 years ago
- Triton implementation of Flash Attention2.0☆52Jul 31, 2023Updated 2 years ago