Distributed training (multi-node) of a Transformer model
☆98Apr 10, 2024Updated 2 years ago
Alternatives and similar repositories for pytorch-transformer-distributed
Users that are interested in pytorch-transformer-distributed are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Notes and commented code for RLHF (PPO)☆132Feb 27, 2024Updated 2 years ago
- ☆251Jan 2, 2025Updated last year
- Notes on quantization in neural networks☆127Dec 14, 2023Updated 2 years ago
- ML algorithms implementations that are good for learning the underlying principles☆28Dec 7, 2024Updated last year
- Attention is all you need implementation☆1,218Jun 8, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- image retrieval/tagging with CLIP☆13Jul 13, 2024Updated last year
- LLaMA 2 implemented from scratch in PyTorch☆368Sep 25, 2023Updated 2 years ago
- [NeurIPS 2025] RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning☆55Oct 23, 2025Updated 7 months ago
- OpenPipe Reinforcement Learning Experiments☆32Mar 14, 2025Updated last year
- ☆29Oct 2, 2025Updated 7 months ago
- Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation: https://www.youtube.com/watch?v=vAmKB7iPkWw☆610Dec 6, 2024Updated last year
- [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models (LLMs + MCTS + Self-Improvement)☆51Dec 15, 2023Updated 2 years ago
- Source code for paper "On the Pareto Front of Multilingual Neural Machine Translation" @ NeurIPS 2023☆17Sep 27, 2023Updated 2 years ago
- Slides for "Retrieval Augmented Generation" video☆26Nov 27, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆27Jun 6, 2024Updated last year
- ☆14Mar 9, 2023Updated 3 years ago
- An implementation of online data mixing for the Pile dataset, based on the GPT-NeoX library.☆14Jan 9, 2024Updated 2 years ago
- Traction adaptive motion planning using sampling augmented adaptive RTI☆11Jun 6, 2021Updated 4 years ago
- Long-range Meta-path Search through Progressive Sampling on Large-scale Heterogeneous Information Networks☆19Dec 4, 2024Updated last year
- Multi-agent system for booking appointments and generating PDF invoices☆13Jul 16, 2025Updated 10 months ago
- ☆12Nov 5, 2024Updated last year
- [CVPR'25] Attention IoU: Examining Biases in CelebA using Attention Maps☆13Mar 26, 2025Updated last year
- ☆12Mar 28, 2023Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- minimal diffusion transformer in pytorch.☆17Oct 6, 2024Updated last year
- ☆39Apr 5, 2024Updated 2 years ago
- ReMe: A Personalized Cognitive Training Framework Based on an LLM Voice Chatbot for Research☆18Jul 3, 2025Updated 10 months ago
- Pretraining summarization models using a corpus of nonsense☆13Sep 28, 2021Updated 4 years ago
- ☆11Mar 5, 2025Updated last year
- Thesis project about Visual Anomaly Detection based on Self Supervised Learning. The model identifies anomalies from information acquired…☆10Apr 14, 2023Updated 3 years ago
- Tayra is a sophisticated call center analytics platform designed to systematically evaluate and score call center audio interactions. By …☆14Dec 19, 2025Updated 5 months ago
- We consider the problem of online trajectory design under time-varying environments. We formulate the general trajectory optimization pro…☆12Jan 1, 2020Updated 6 years ago
- Temporal and Causal Reasoning (dataset)☆10Apr 19, 2022Updated 4 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Code and Data release for "Improving Multilingual Translation by Representation and Gradient Regularization" (Yang et al. EMNLP 2021), an…☆13Aug 12, 2024Updated last year
- C++ implementation of the GJK algorithm for convex polygon collision detection.☆11Aug 22, 2019Updated 6 years ago
- https://www.text-mining.ro/ demonstrates how to design & implement a Web Search Engine☆19Apr 2, 2026Updated last month
- Master's Thesis on Lane Change in Autonomous Vehicles.☆12Aug 19, 2022Updated 3 years ago
- ☆17Dec 23, 2025Updated 5 months ago
- Simplest AlphaZero Implementation☆26Nov 6, 2024Updated last year
- ☆13Feb 8, 2019Updated 7 years ago