An Open-Source Distributed Deep Learning Framework
☆12Dec 14, 2022Updated 3 years ago
Alternatives and similar repositories for tarantella
Users that are interested in tarantella are comparing it to the libraries listed below
Sorting:
- PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.☆10Feb 10, 2022Updated 4 years ago
- CMU 15-745 Spring 2014☆10Mar 7, 2014Updated 12 years ago
- Implementation of a Tensorflow XLA rematerialization pass☆15Dec 20, 2019Updated 6 years ago
- Spatio-temporal pattern contruct and model fusion☆11Jun 10, 2019Updated 6 years ago
- ☆12Nov 5, 2024Updated last year
- An external memory allocator example for PyTorch.☆16Aug 10, 2025Updated 6 months ago
- 让GIF显示文件自身的MD5值☆13Nov 22, 2020Updated 5 years ago
- ☆17Apr 10, 2024Updated last year
- Desktop version of ChatGPT, support manually set cookie☆19Dec 9, 2022Updated 3 years ago
- Evaluation utilities based on SymPy.☆21Dec 12, 2024Updated last year
- ☆21Nov 29, 2022Updated 3 years ago
- We introduce FixEval , a dataset for competitive programming bug fixing along with a comprehensive test suite and show the necessity of e…☆26Aug 31, 2022Updated 3 years ago
- pytorch ucc plugin☆23Jul 8, 2021Updated 4 years ago
- Some common CUDA kernel implementations (Not the fastest).☆29Dec 5, 2025Updated 3 months ago
- ☆23Jun 5, 2019Updated 6 years ago
- ☆28Jul 11, 2021Updated 4 years ago
- Sequence-level 1F1B schedule for LLMs.☆38Aug 26, 2025Updated 6 months ago
- ☆35Sep 13, 2021Updated 4 years ago
- ☆50Aug 21, 2025Updated 6 months ago
- Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO. I AM NOT ONE OF THE AUTHORS OF THE PAPER.☆55Nov 24, 2025Updated 3 months ago
- ☆76Sep 15, 2025Updated 5 months ago
- ☆41Jun 18, 2021Updated 4 years ago
- Toolchain built around the Megatron-LM for Distributed Training☆88Updated this week
- npcomp - An aspirational MLIR based numpy compiler☆51Jul 31, 2020Updated 5 years ago
- The code and data for the paper JiuZhang3.0☆49May 26, 2024Updated last year
- An experimental ahead of time compiler for Relay.☆49Apr 21, 2020Updated 5 years ago
- DeepXTrace is a lightweight tool for precisely diagnosing slow ranks in DeepEP-based environments.☆93Jan 16, 2026Updated last month
- Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocatio…☆90Sep 11, 2025Updated 5 months ago
- [Technical Report] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with …☆63Oct 9, 2024Updated last year
- LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification☆74Jul 14, 2025Updated 7 months ago
- [Preprint] RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments☆185Jan 12, 2026Updated last month
- Sky Computing: Accelerating Geo-distributed Computing in Federated Learning☆90Nov 22, 2022Updated 3 years ago
- Patch convolution to avoid large GPU memory usage of Conv2D☆95Jan 23, 2025Updated last year
- Bridge Megatron-Core to Hugging Face/Reinforcement Learning☆197Mar 2, 2026Updated last week
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆123May 6, 2025Updated 10 months ago
- A Tool for Automatic Parallelization of Deep Learning Training in Distributed Multi-GPU Environments.☆132Feb 21, 2022Updated 4 years ago
- DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL☆289Oct 2, 2025Updated 5 months ago
- NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.☆122Nov 15, 2023Updated 2 years ago
- A Library for fast Hash Tables on GPUs☆132Oct 14, 2025Updated 4 months ago