A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to copy code and launch discussions about the problems you have encoured.
☆58Jul 4, 2023Updated 2 years ago
Alternatives and similar repositories for llama-pipeline-parallel
Users that are interested in llama-pipeline-parallel are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆97Feb 5, 2024Updated 2 years ago
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆224Nov 21, 2023Updated 2 years ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆83Jan 14, 2025Updated last year
- ☆19Jul 24, 2025Updated 10 months ago
- ☆14Dec 28, 2022Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators☆26Jul 26, 2023Updated 2 years ago
- Simhash and near-duplicate detection☆17Dec 6, 2013Updated 12 years ago
- ☆11Oct 8, 2023Updated 2 years ago
- Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning☆12Aug 23, 2025Updated 9 months ago
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆41Jan 4, 2024Updated 2 years ago
- ☆18Aug 19, 2024Updated last year
- 集中管理所有的prompt。☆14Nov 27, 2024Updated last year
- The appendix and core code of model CauSTG, for accepted paper in KDD 2023.☆12Jun 15, 2023Updated 2 years ago
- ☆11Aug 15, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- An open-source library for contamination detection in NLP datasets and Large Language Models (LLMs).☆61Aug 13, 2024Updated last year
- Official Codebase for "Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control" (NeurIPS 2024)☆15Oct 29, 2024Updated last year
- GEMV implementation with CUTLASS☆21Aug 21, 2025Updated 9 months ago
- ☆17Oct 15, 2023Updated 2 years ago
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- ☆14Jul 13, 2022Updated 3 years ago
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆759Sep 27, 2024Updated last year
- code for COLING paper "A Hybrid Model of Classification and Generation for Spatial Relation Extraction"☆10Oct 20, 2022Updated 3 years ago
- [MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models☆55Oct 20, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ACL 2021 Findings] HySPA: Hybrid Span Generation for Scalable Text-to-Graph Extraction☆10Sep 16, 2021Updated 4 years ago
- [EMNLP 2022] Language Model Pre-Training with Sparse Latent Typing☆14Feb 10, 2023Updated 3 years ago
- ☆16Apr 11, 2022Updated 4 years ago
- [IJCAI'24] Official code for our paper "Make Graph Neural Networks Great Again: A Generic Integration Paradigm of Topology-Free Patterns …☆15Jul 3, 2025Updated 10 months ago
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆64Jul 30, 2023Updated 2 years ago
- Source code for ICLR 2021 paper : Pre-training Text-to-Text Transformers for Concept-Centric Common Sense☆26Sep 16, 2021Updated 4 years ago
- ☆16May 15, 2025Updated last year
- 时间关键词正则提取以及标准化☆20Dec 19, 2021Updated 4 years ago
- Code for Unsupervised Multi-Target Domain Adaptation: An Information Theoretic Approach☆14Jul 19, 2020Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- The Web Conference 2020: Structure-Feature based Graph Self-adaptive Pooling☆30Apr 21, 2020Updated 6 years ago
- Ring attention implementation with flash attention☆1,021Sep 10, 2025Updated 8 months ago
- ☆21Mar 7, 2024Updated 2 years ago
- Simple Contrastive Multi-View Clustering with Data-Level Fusion☆15Jul 25, 2025Updated 10 months ago
- Visual and Embodied Concepts evaluation benchmark☆21Oct 10, 2023Updated 2 years ago
- Code publication to the paper "Normalized Attention Without Probability Cage"☆17Nov 9, 2021Updated 4 years ago
- Towards Systematic Measurement for Long Text Quality☆38Sep 5, 2024Updated last year