Distributed training (multi-node) of a Transformer model
☆99Apr 10, 2024Updated 2 years ago
Alternatives and similar repositories for pytorch-transformer-distributed
Users that are interested in pytorch-transformer-distributed are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Notes on Direct Preference Optimization☆28Apr 14, 2024Updated 2 years ago
- Notes and commented code for RLHF (PPO)☆135Feb 27, 2024Updated 2 years ago
- ☆254Jan 2, 2025Updated last year
- Notes on quantization in neural networks☆129Dec 14, 2023Updated 2 years ago
- ☆14Feb 23, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Anything I read, whether it's a paper, a book, or an article, I'll post here.☆11Feb 13, 2025Updated last year
- ☆49Feb 23, 2025Updated last year
- Attention is all you need implementation☆1,226Jun 8, 2024Updated 2 years ago
- Streamline data pipelines for AI. Process datasets across 1000s of machines, and optimize data for blazing fast model training.☆16Sep 18, 2024Updated last year
- image retrieval/tagging with CLIP☆13Jul 13, 2024Updated last year
- Solving the OpenAI Gym (MountainCarContinuous-v0) with DDPG☆21Jan 23, 2023Updated 3 years ago
- LLaMA 2 implemented from scratch in PyTorch☆371Sep 25, 2023Updated 2 years ago
- [NeurIPS 2025] RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning☆55Oct 23, 2025Updated 7 months ago
- ☆26Dec 30, 2025Updated 5 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- OpenPipe Reinforcement Learning Experiments☆33Mar 14, 2025Updated last year
- ☆32Oct 2, 2025Updated 8 months ago
- Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation: https://www.youtube.com/watch?v=vAmKB7iPkWw☆614Dec 6, 2024Updated last year
- [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models (LLMs + MCTS + Self-Improvement)☆51Dec 15, 2023Updated 2 years ago
- Source code for paper "On the Pareto Front of Multilingual Neural Machine Translation" @ NeurIPS 2023☆17Sep 27, 2023Updated 2 years ago
- Notes on the Mamba and the S4 model (Mamba: Linear-Time Sequence Modeling with Selective State Spaces)☆183Jan 7, 2024Updated 2 years ago
- ☆27Jun 6, 2024Updated 2 years ago
- ☆14Mar 9, 2023Updated 3 years ago
- Llama3开源模型中文版-全方位测评,基于SuperCLUE基准 | Llama3 Chinese Evaluation with SuperCLUE☆16Apr 21, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- JapaneseArabic Dictionary (日本語・アラビア語辞書) قاموس اللغة اليابانية والعربية (Yomitan)☆20May 20, 2025Updated last year
- Long-range Meta-path Search through Progressive Sampling on Large-scale Heterogeneous Information Networks☆19Dec 4, 2024Updated last year
- A Catalog lists instruction sets, models available for Indic language☆10Mar 14, 2024Updated 2 years ago
- Technical Analysis Library using Pandas (Modin for speedup) (Python)☆11Jun 24, 2019Updated 6 years ago
- [CVPR'25] Attention IoU: Examining Biases in CelebA using Attention Maps☆13Mar 26, 2025Updated last year
- minimal diffusion transformer in pytorch.☆17Oct 6, 2024Updated last year
- ☆39Apr 5, 2024Updated 2 years ago
- Forcing Diffuse Distributions out of Language Models☆18Sep 10, 2024Updated last year
- ReMe: A Personalized Cognitive Training Framework Based on an LLM Voice Chatbot for Research☆17Jul 3, 2025Updated 11 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Pretraining summarization models using a corpus of nonsense☆13Sep 28, 2021Updated 4 years ago
- ☆10Jan 28, 2024Updated 2 years ago
- ☆11Mar 5, 2025Updated last year
- Tayra is a sophisticated call center analytics platform designed to systematically evaluate and score call center audio interactions. By …☆14Dec 19, 2025Updated 6 months ago
- POC integration Airbyte+Dagster+Langchain☆13Jun 1, 2023Updated 3 years ago
- Let ChatGPT answer your Gmail for you☆15Feb 12, 2024Updated 2 years ago
- Temporal and Causal Reasoning (dataset)☆10Apr 19, 2022Updated 4 years ago