train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism
☆224Nov 21, 2023Updated 2 years ago
Alternatives and similar repositories for transpeeder
Users that are interested in transpeeder are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆97Feb 5, 2024Updated 2 years ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆58Jul 4, 2023Updated 2 years ago
- ☆84Sep 9, 2023Updated 2 years ago
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆69May 9, 2023Updated 2 years ago
- LLaMa Tuning with Stanford Alpaca Dataset using Deepspeed and Transformers☆50Mar 15, 2023Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆19Jul 20, 2023Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,438Mar 20, 2024Updated 2 years ago
- Best practice for training LLaMA models in Megatron-LM☆663Jan 2, 2024Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆2,236Aug 14, 2025Updated 7 months ago
- Automatically split your PyTorch models on multiple GPUs for training & inference☆657Jan 2, 2024Updated 2 years ago
- Collaborative Training of Large Language Models in an Efficient Way☆419Aug 28, 2024Updated last year
- distributed trainer for LLMs☆590May 20, 2024Updated last year
- code for Scaling Laws of RoPE-based Extrapolation☆73Oct 16, 2023Updated 2 years ago
- ☆16Mar 30, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆69Jul 20, 2023Updated 2 years ago
- Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)☆2,697Aug 14, 2024Updated last year
- Panda项目是于2023年5月启动的开源海外中文大语言模型项目,致力于大模型时代探索整个技术栈,旨在推动中文自然语言处理领域的创新和合作。☆1,036Oct 19, 2023Updated 2 years ago
- Example models using DeepSpeed☆6,807Mar 4, 2026Updated 3 weeks ago
- Secrets of RLHF in Large Language Models Part I: PPO☆1,422Mar 3, 2024Updated 2 years ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆653Jan 15, 2026Updated 2 months ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆118Jun 5, 2023Updated 2 years ago
- [NIPS2023] RRHF & Wombat☆808Sep 22, 2023Updated 2 years ago
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆755Sep 27, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)☆8,286Oct 16, 2024Updated last year
- [NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation☆475Mar 7, 2024Updated 2 years ago
- Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集☆3,054Apr 14, 2024Updated last year
- Finetuning LLaMA with DeepSpeed☆10Apr 14, 2023Updated 2 years ago
- YaRN: Efficient Context Window Extension of Large Language Models☆1,686Apr 17, 2024Updated last year
- A large-scale 7B pretraining language model developed by BaiChuan-Inc.☆5,680Jul 18, 2024Updated last year
- Pipeline Parallelism for PyTorch☆786Aug 21, 2024Updated last year
- Ongoing research training transformer models at scale☆15,744Mar 20, 2026Updated last week
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆1,000Dec 6, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆4,742Jan 8, 2024Updated 2 years ago
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆490Mar 19, 2024Updated 2 years ago
- Mixture of Expert (MoE) techniques for enhancing LLM performance through expert-driven prompt mapping and adapter combinations.☆12Feb 11, 2024Updated 2 years ago
- Implementation of Chinese ChatGPT☆289Nov 20, 2023Updated 2 years ago
- An implementation of an autoregressive language model using an improved Transformer and DeepSpeed pipeline parallelism.☆30Jan 12, 2026Updated 2 months ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆449Oct 16, 2024Updated last year
- 万卷1.0多模态语料☆570Oct 20, 2023Updated 2 years ago