A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to copy code and launch discussions about the problems you have encoured.
☆58Jul 4, 2023Updated 2 years ago
Alternatives and similar repositories for llama-pipeline-parallel
Users that are interested in llama-pipeline-parallel are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆97Feb 5, 2024Updated 2 years ago
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆224Nov 21, 2023Updated 2 years ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆83Jan 14, 2025Updated last year
- ☆19Jul 24, 2025Updated 8 months ago
- ☆14Dec 28, 2022Updated 3 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Simhash and near-duplicate detection☆17Dec 6, 2013Updated 12 years ago
- Contrastive Object-level Pre-training with Spatial Noise Curriculum Learning☆20Feb 4, 2022Updated 4 years ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆40Jan 4, 2024Updated 2 years ago
- Mosaic Representation Learning for Self-supervised Visual Pre-training (ICLR2023, Spotlight)☆15Apr 7, 2023Updated 2 years ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆53Jun 24, 2024Updated last year
- Fully open reproduction of DeepSeek-R1☆11Mar 24, 2025Updated last year
- Unsupervised Cross-lingual Sentiment Analysis (CoNLL 2019)☆10Nov 4, 2019Updated 6 years ago
- 集中管理所有的prompt。☆14Nov 27, 2024Updated last year
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆60Aug 13, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- The appendix and core code of model CauSTG, for accepted paper in KDD 2023.☆12Jun 15, 2023Updated 2 years ago
- ☆17Oct 15, 2023Updated 2 years ago
- ☆28Dec 2, 2024Updated last year
- Pipeline Parallelism for PyTorch☆786Aug 21, 2024Updated last year
- ☆13Mar 24, 2024Updated 2 years ago
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- Zeta implementation of a reusable and plug in and play feedforward from the paper "Exponentially Faster Language Modeling"☆16Nov 11, 2024Updated last year
- ☆14Jul 13, 2022Updated 3 years ago
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆755Sep 27, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- 🖖 图谱式笔记系统,旨在提高个人笔记的使用率!☆12Jan 17, 2021Updated 5 years ago
- [ACL 2021 Findings] HySPA: Hybrid Span Generation for Scalable Text-to-Graph Extraction☆10Sep 16, 2021Updated 4 years ago
- A framework for few-shot evaluation of autoregressive language models.☆12Jul 14, 2025Updated 8 months ago
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing☆14Feb 10, 2023Updated 3 years ago
- ☆19Nov 21, 2024Updated last year
- ☆16Apr 11, 2022Updated 3 years ago
- [NeurIPS 2025 Spotlight] A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone.☆46Oct 29, 2025Updated 4 months ago
- [IJCAI'24] Official code for our paper "Make Graph Neural Networks Great Again: A Generic Integration Paradigm of Topology-Free Patterns …☆14Jul 3, 2025Updated 8 months ago
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆64Jul 30, 2023Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Distilling Task-Specific Knowledge from BERT into Simple Neural Networks.☆15Aug 28, 2020Updated 5 years ago
- Source code for ICLR 2021 paper : Pre-training Text-to-Text Transformers for Concept-Centric Common Sense☆26Sep 16, 2021Updated 4 years ago
- 高性能文本 Tokenizer 库☆32Feb 2, 2024Updated 2 years ago
- The Web Conference 2020: Structure-Feature based Graph Self-adaptive Pooling☆30Apr 21, 2020Updated 5 years ago
- ☆21Mar 7, 2024Updated 2 years ago
- Simple Contrastive Multi-View Clustering with Data-Level Fusion☆15Jul 25, 2025Updated 8 months ago
- Code publication to the paper "Normalized Attention Without Probability Cage"☆17Nov 9, 2021Updated 4 years ago