train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism
☆224Nov 21, 2023Updated 2 years ago
Alternatives and similar repositories for transpeeder
Users that are interested in transpeeder are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆97Feb 5, 2024Updated 2 years ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆58Jul 4, 2023Updated 2 years ago
- ☆84Sep 9, 2023Updated 2 years ago
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆69May 9, 2023Updated 3 years ago
- LLaMa Tuning with Stanford Alpaca Dataset using Deepspeed and Transformers☆49Mar 15, 2023Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆19Jul 20, 2023Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,439Mar 20, 2024Updated 2 years ago
- Best practice for training LLaMA models in Megatron-LM☆664Jan 2, 2024Updated 2 years ago
- Automatically split your PyTorch models on multiple GPUs for training & inference☆654Jan 2, 2024Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆2,246Aug 14, 2025Updated 8 months ago
- Collaborative Training of Large Language Models in an Efficient Way☆420Aug 28, 2024Updated last year
- distributed trainer for LLMs☆589May 20, 2024Updated last year
- code for Scaling Laws of RoPE-based Extrapolation☆73Oct 16, 2023Updated 2 years ago
- ☆16Mar 30, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆69Jul 20, 2023Updated 2 years ago
- Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)☆2,692Aug 14, 2024Updated last year
- Panda项目是于2023年5月启动的开源海外中文大语言模型项目,致力于大模型时代探索整个技术栈,旨在推动中文自然语言处理领域的创新和合作。☆1,034Oct 19, 2023Updated 2 years ago
- Example models using DeepSpeed☆6,820Mar 30, 2026Updated last month
- Secrets of RLHF in Large Language Models Part I: PPO☆1,427Mar 3, 2024Updated 2 years ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆118Jun 5, 2023Updated 2 years ago
- [NIPS2023] RRHF & Wombat☆806Sep 22, 2023Updated 2 years ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆667Jan 15, 2026Updated 3 months ago
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆758Sep 27, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)☆8,279Oct 16, 2024Updated last year
- [NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation☆476Mar 7, 2024Updated 2 years ago
- Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集☆3,050Apr 14, 2024Updated 2 years ago
- Finetuning LLaMA with DeepSpeed☆10Apr 14, 2023Updated 3 years ago
- YaRN: Efficient Context Window Extension of Large Language Models☆1,710Apr 17, 2024Updated 2 years ago
- Pipeline Parallelism for PyTorch☆786Aug 21, 2024Updated last year
- Ongoing research training transformer models at scale☆16,253Updated this week
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆4,746Jan 8, 2024Updated 2 years ago
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆1,001Dec 6, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆497Mar 19, 2024Updated 2 years ago
- Mixture of Expert (MoE) techniques for enhancing LLM performance through expert-driven prompt mapping and adapter combinations.☆12Feb 11, 2024Updated 2 years ago
- Implementation of Chinese ChatGPT☆287Nov 20, 2023Updated 2 years ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆450Oct 16, 2024Updated last year
- An implementation of an autoregressive language model using an improved Transformer and DeepSpeed pipeline parallelism.☆29Jan 12, 2026Updated 3 months ago
- SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples☆76Sep 18, 2022Updated 3 years ago
- 万卷1.0多模态语料☆572Oct 20, 2023Updated 2 years ago