A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to copy code and launch discussions about the problems you have encoured.
☆58Jul 4, 2023Updated 2 years ago
Alternatives and similar repositories for llama-pipeline-parallel
Users that are interested in llama-pipeline-parallel are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆97Feb 5, 2024Updated 2 years ago
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆224Nov 21, 2023Updated 2 years ago
- LLM KV Cache compression - K+V dual compression, 73-99% VRAM savings, zero accuracy loss☆51Mar 30, 2026Updated last month
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆83Jan 14, 2025Updated last year
- ☆19Jul 24, 2025Updated 9 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆14Dec 28, 2022Updated 3 years ago
- [ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators☆26Jul 26, 2023Updated 2 years ago
- ☆11Oct 8, 2023Updated 2 years ago
- Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning☆20Feb 4, 2022Updated 4 years ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆40Jan 4, 2024Updated 2 years ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆53Jun 24, 2024Updated last year
- Mosaic Representation Learning for Self-supervised Visual Pre-training (ICLR2023, Spotlight)☆15Apr 7, 2023Updated 3 years ago
- Unsupervised Cross-lingual Sentiment Analysis (CoNLL 2019)☆10Nov 4, 2019Updated 6 years ago
- ☆18Aug 19, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- 集中管理所有的prompt。☆14Nov 27, 2024Updated last year
- ☆11Aug 15, 2023Updated 2 years ago
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆62Aug 13, 2024Updated last year
- Official Codebase for "Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control" (NeurIPS 2024)☆15Oct 29, 2024Updated last year
- GEMV implementation with CUTLASS☆21Aug 21, 2025Updated 8 months ago
- ☆17Oct 15, 2023Updated 2 years ago
- ☆28Dec 2, 2024Updated last year
- Pipeline Parallelism for PyTorch☆786Aug 21, 2024Updated last year
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆16Nov 11, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆14Jul 13, 2022Updated 3 years ago
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆758Sep 27, 2024Updated last year
- Completing the Puzzle of All-in-One Event Understanding Benchmark with Event Arguments☆14Mar 12, 2024Updated 2 years ago
- code for COLING paper "A Hybrid Model of Classification and Generation for Spatial Relation Extraction"☆10Oct 20, 2022Updated 3 years ago
- [MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models☆55Oct 20, 2024Updated last year
- [ACL 2021 Findings] HySPA: Hybrid Span Generation for Scalable Text-to-Graph Extraction☆10Sep 16, 2021Updated 4 years ago
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing☆14Feb 10, 2023Updated 3 years ago
- Official codebase of "Update Your Transformer to the Latest Release: Re-Basin of Task Vectors" - ICML 2025☆23Jul 30, 2025Updated 9 months ago
- ☆16Apr 11, 2022Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [NeurIPS 2025 Spotlight] A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone.☆47Oct 29, 2025Updated 6 months ago
- [TBD] "m4: A Learned Flow-level Network Simulator" by Chenning Li, Anton A. Zabreyko, Om Chabra, Arash Nasr-Esfahany, Kevin Zhao, Pratees…☆18Apr 27, 2026Updated last week
- [IJCAI'24] Official code for our paper "Make Graph Neural Networks Great Again: A Generic Integration Paradigm of Topology-Free Patterns …☆15Jul 3, 2025Updated 10 months ago
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆64Jul 30, 2023Updated 2 years ago
- Distilling Task-Specific Knowledge from BERT into Simple Neural Networks.☆15Aug 28, 2020Updated 5 years ago
- Source code for ICLR 2021 paper : Pre-training Text-to-Text Transformers for Concept-Centric Common Sense☆26Sep 16, 2021Updated 4 years ago
- 高性能文本 Tokenizer 库☆31Feb 2, 2024Updated 2 years ago