Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.
☆97Feb 5, 2024Updated 2 years ago
Alternatives and similar repositories for gdGPT
Users that are interested in gdGPT are comparing it to the libraries listed below
Sorting:
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆224Nov 21, 2023Updated 2 years ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆58Jul 4, 2023Updated 2 years ago
- ☆84Sep 9, 2023Updated 2 years ago
- 工业级中文语音识别系统电子书☆13Oct 30, 2020Updated 5 years ago
- ☆26Nov 7, 2022Updated 3 years ago
- ☆26Aug 31, 2023Updated 2 years ago
- Code for paper titled "Towards the Law of Capacity Gap in Distilling Language Models"☆102Jul 9, 2024Updated last year
- Deep Learning for Video Retrieval by Natural Language☆11Oct 20, 2019Updated 6 years ago
- aigc evals☆10Dec 2, 2023Updated 2 years ago
- The official repo for our SIGIR'23 Full paper: Constructing Tree-based Index for Efficient and Effective Dense Retrieval☆28Jun 7, 2023Updated 2 years ago
- A packaged convolutional voice activity detector for noisy environments.☆14Jun 15, 2019Updated 6 years ago
- pytorch implementation of mvp: a multi-stage vision-language pre-training framework☆11Apr 23, 2022Updated 3 years ago
- Semi-supervised spoken language understanding (SLU) via self-supervised speech and language model pretraining☆12Mar 23, 2021Updated 4 years ago
- A gomoku AI based on Alpha Zero paper.☆12May 1, 2023Updated 2 years ago
- The codebase for "Group-wise Contrastive Learning for Neural Dialogue Generation" (Cai et al., Findings of EMNLP 2020)☆55Feb 24, 2021Updated 5 years ago
- [SIGIR 2025] The official repo for "Scaling Sparse and Dense Retrieval in Decoder-Only LLMs"☆20Mar 31, 2025Updated 11 months ago
- The baseline method for CCIR 22 https://www.datafountain.cn/competitions/573☆13Aug 2, 2022Updated 3 years ago
- ☆17Jul 5, 2022Updated 3 years ago
- QRHead: Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking☆36Jan 20, 2026Updated last month
- ☆52Aug 14, 2024Updated last year
- ddl-benchmarks: Benchmarks for Distributed Deep Learning☆36May 29, 2020Updated 5 years ago
- Code for Robust Fine-tuning (RbFT)☆17Jan 31, 2025Updated last year
- PyTorch Implementation: Code for the paper "Generalizing to Unseen Domains via Adversarial Data Augmentation", NeurIPS 2018. Origin Tenso…☆14Sep 17, 2020Updated 5 years ago
- FTPipe and related pipeline model parallelism research.☆44May 16, 2023Updated 2 years ago
- ☆41Feb 12, 2019Updated 7 years ago
- Elixir: Train a Large Language Model on a Small GPU Cluster☆15Jun 8, 2023Updated 2 years ago
- ☆23Oct 30, 2023Updated 2 years ago
- Multi-language Enhanced LLaMA☆303Apr 13, 2023Updated 2 years ago
- Best practice for training LLaMA models in Megatron-LM☆663Jan 2, 2024Updated 2 years ago
- DisCo Transformer for Non-autoregressive MT☆77Jul 28, 2022Updated 3 years ago
- ☆43Dec 15, 2023Updated 2 years ago
- Simple Model Similarities Analysis☆21Feb 3, 2024Updated 2 years ago
- Grammatical Error Correction Based on Language Model(BERT, GPT-2), and Seq2Seq☆18Sep 5, 2019Updated 6 years ago
- 1.4B sLLM for Chinese and English - HammerLLM🔨☆43Apr 7, 2024Updated last year
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,437Mar 20, 2024Updated last year
- ☆19Sep 20, 2022Updated 3 years ago
- Example of distributed learning in Julia☆22Jun 28, 2017Updated 8 years ago
- 나무위키덤프에서 정제된 텍스트를 얻기 위한 NamuwikiExtractor☆19Feb 27, 2022Updated 4 years ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆83Jan 14, 2025Updated last year