Distributed training (multi-node) of a Transformer model
☆94Apr 10, 2024Updated last year
Alternatives and similar repositories for pytorch-transformer-distributed
Users that are interested in pytorch-transformer-distributed are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Notes on Direct Preference Optimization☆24Apr 14, 2024Updated last year
- Notes and commented code for RLHF (PPO)☆126Feb 27, 2024Updated 2 years ago
- ☆242Jan 2, 2025Updated last year
- An efficient and scalable attention module designed to reduce memory usage and improve inference speed in large language models. Designe…☆21Jun 25, 2025Updated 9 months ago
- ☆48Feb 23, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Attention is all you need implementation☆1,191Jun 8, 2024Updated last year
- Streamline data pipelines for AI. Process datasets across 1000s of machines, and optimize data for blazing fast model training.☆16Sep 18, 2024Updated last year
- ☆19Sep 9, 2024Updated last year
- Big Data and Machine Intelligence, Spring 2021.☆12Jul 2, 2021Updated 4 years ago
- E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models☆42Jan 5, 2026Updated 2 months ago
- image retrieval/tagging with CLIP☆13Jul 13, 2024Updated last year
- LLaMA 2 implemented from scratch in PyTorch☆367Sep 25, 2023Updated 2 years ago
- [NeurIPS 2025] RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning☆52Oct 23, 2025Updated 5 months ago
- ☆23Dec 30, 2025Updated 2 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- OpenPipe Reinforcement Learning Experiments☆32Mar 14, 2025Updated last year
- Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation: https://www.youtube.com/watch?v=vAmKB7iPkWw☆598Dec 6, 2024Updated last year
- [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models (LLMs + MCTS + Self-Improvement)☆50Dec 15, 2023Updated 2 years ago
- How can we improve name matching in screening tools?☆15Aug 13, 2025Updated 7 months ago
- Source code for paper "On the Pareto Front of Multilingual Neural Machine Translation" @ NeurIPS 2023☆17Sep 27, 2023Updated 2 years ago
- Llama3开源模型中文版-全方位测评,基于SuperCLUE基准 | Llama3 Chinese Evaluation with SuperCLUE☆16Apr 21, 2024Updated last year
- JapaneseArabic Dictionary (日本語・アラビア語辞書) قاموس اللغة اليابانية والعربية (Yomitan)☆19May 20, 2025Updated 10 months ago
- An implementation of online data mixing for the Pile dataset, based on the GPT-NeoX library.☆14Jan 9, 2024Updated 2 years ago
- Long-range Meta-path Search through Progressive Sampling on Large-scale Heterogeneous Information Networks☆19Dec 4, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ☆12Mar 28, 2023Updated 3 years ago
- This repository contains sample for the Speech Service Voice live API☆25Updated this week
- minimal diffusion transformer in pytorch.☆17Oct 6, 2024Updated last year
- Pretraining summarization models using a corpus of nonsense☆13Sep 28, 2021Updated 4 years ago
- This is the repository for our EMNLP 2022 paper "The Importance of Being Parameters: An Intra-Distillation Method for Serious Gains".☆10Jun 2, 2023Updated 2 years ago
- Let ChatGPT answer your Gmail for you☆15Feb 12, 2024Updated 2 years ago
- Temporal and Causal Reasoning (dataset)☆10Apr 19, 2022Updated 3 years ago
- [ICLR 2026] Thinking on the Fly: Test-Time Reasoning Enhancement via Latent Thought Policy Optimization☆24Mar 6, 2026Updated 3 weeks ago
- ☆17Dec 23, 2025Updated 3 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Simplest AlphaZero Implementation☆25Nov 6, 2024Updated last year
- ☆11Aug 28, 2025Updated 7 months ago
- CSE 6363 - Machine Learning☆15Mar 2, 2026Updated 3 weeks ago
- ✱ Understanding the underlying learning dynamics of simple tasks in Transformer networks☆18Aug 16, 2024Updated last year
- ☆13Feb 8, 2019Updated 7 years ago
- ☆19Oct 2, 2023Updated 2 years ago
- ☆12Nov 15, 2022Updated 3 years ago