A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to copy code and launch discussions about the problems you have encoured.
β58Jul 4, 2023Updated 2 years ago
Alternatives and similar repositories for llama-pipeline-parallel
Users that are interested in llama-pipeline-parallel are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.β97Feb 5, 2024Updated 2 years ago
- train llama on a single A100 80G node using π€βtransformers and πβDeepspeed Pipeline Parallelismβ224Nov 21, 2023Updated 2 years ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".β84Jan 14, 2025Updated last year
- β19Jul 24, 2025Updated 10 months ago
- β15Dec 28, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generatorsβ26Jul 26, 2023Updated 2 years ago
- Official repo for "TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders"β25Apr 9, 2026Updated 2 months ago
- Simhash and near-duplicate detectionβ17Dec 6, 2013Updated 12 years ago
- β11Oct 8, 2023Updated 2 years ago
- Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learningβ12Aug 23, 2025Updated 9 months ago
- Contrastive Object-level Pre-training with Spatial Noise Curriculum Learningβ20Feb 4, 2022Updated 4 years ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];β41Jan 4, 2024Updated 2 years ago
- Mosaic Representation Learning for Self-supervised Visual Pre-training (ICLR2023, Spotlight)β15Apr 7, 2023Updated 3 years ago
- Unsupervised Cross-lingual Sentiment Analysis (CoNLL 2019)β10Nov 4, 2019Updated 6 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- β18Aug 19, 2024Updated last year
- The appendix and core code of model CauSTG, for accepted paper in KDD 2023.β12Jun 15, 2023Updated 3 years ago
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).β61Aug 13, 2024Updated last year
- GEMV implementation with CUTLASSβ21Aug 21, 2025Updated 9 months ago
- β17Oct 15, 2023Updated 2 years ago
- Distributed SDDMM Kernelβ12Jul 8, 2022Updated 3 years ago
- β14Jul 13, 2022Updated 3 years ago
- β85Mar 12, 2026Updated 3 months ago
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.β760Sep 27, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Completing the Puzzle of All-in-One Event Understanding Benchmark with Event Argumentsβ14Mar 12, 2024Updated 2 years ago
- [ACL 2021 Findings] HySPA: Hybrid Span Generation for Scalable Text-to-Graph Extractionβ10Sep 16, 2021Updated 4 years ago
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typingβ14Feb 10, 2023Updated 3 years ago
- β16Apr 11, 2022Updated 4 years ago
- Google DeepMind: Mixture of Depths Unofficial Implementation.β13May 29, 2024Updated 2 years ago
- [3DV 2025] CoE: Deep Coupled Embedding for Non-Rigid Point Cloud Correspondencesβ20Jan 5, 2026Updated 5 months ago
- [IJCAI'24] Official code for our paper "Make Graph Neural Networks Great Again: A Generic Integration Paradigm of Topology-Free Patterns β¦β15Jul 3, 2025Updated 11 months ago
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformerβ64Jul 30, 2023Updated 2 years ago
- Distilling Task-Specific Knowledge from BERT into Simple Neural Networks.β15Aug 28, 2020Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Close your Zoom meeting tabs automaticallyβ20Apr 14, 2024Updated 2 years ago
- Source code for ICLR 2021 paper : Pre-training Text-to-Text Transformers for Concept-Centric Common Senseβ26Sep 16, 2021Updated 4 years ago
- β16May 15, 2025Updated last year
- sketch-rnn demo for seoul mediacity biennale 2018β13Sep 4, 2018Updated 7 years ago
- Code for Unsupervised Multi-Target Domain Adaptation: An Information Theoretic Approachβ14Jul 19, 2020Updated 5 years ago
- β21Mar 7, 2024Updated 2 years ago
- Visual and Embodied Concepts evaluation benchmarkβ21Oct 10, 2023Updated 2 years ago