A Python library transfers PyTorch tensors between CPU and NVMe
☆125Nov 27, 2024Updated last year
Alternatives and similar repositories for TensorNVMe
Users that are interested in TensorNVMe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Performance benchmarking with ColossalAI☆39Jul 6, 2022Updated 3 years ago
- Examples of training models with hybrid parallelism using ColossalAI☆339Mar 23, 2023Updated 3 years ago
- ☆12Apr 30, 2024Updated 2 years ago
- ☆30Sep 4, 2023Updated 2 years ago
- ☆28Jul 11, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- A memory efficient DLRM training solution using ColossalAI☆108Nov 22, 2022Updated 3 years ago
- Large-scale model inference.☆629Sep 12, 2023Updated 2 years ago
- A collection of models built with ColossalAI☆33Nov 22, 2022Updated 3 years ago
- Scalable PaLM implementation of PyTorch☆190Dec 19, 2022Updated 3 years ago
- Accuracy 77%. Large batch deep learning optimizer LARS for ImageNet with PyTorch and ResNet, using Horovod for distribution. Optional acc…☆38Jun 1, 2021Updated 5 years ago
- ☆24May 29, 2026Updated last week
- PyTorch implementation of LAMB for ImageNet/ResNet-50 training☆13May 13, 2021Updated 5 years ago
- Elixir: Train a Large Language Model on a Small GPU Cluster☆16Jun 8, 2023Updated 3 years ago
- ☆12Sep 1, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A curated list of awesome projects and papers for distributed training or inference☆279Oct 8, 2024Updated last year
- Automated Parallelization System and Infrastructure for Multiple Ecosystems☆82Nov 19, 2024Updated last year
- ☆21Jun 6, 2024Updated 2 years ago
- ☆21Jul 2, 2022Updated 3 years ago
- ☆14Nov 7, 2025Updated 7 months ago
- TVMScript kernel for deformable attention☆25Dec 15, 2021Updated 4 years ago
- ☆227Mar 28, 2026Updated 2 months ago
- GeminiFS: A Companion File System for GPUs☆82Feb 18, 2025Updated last year
- A baseline repository of Auto-Parallelism in Training Neural Networks☆146Jun 25, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.☆71Mar 20, 2025Updated last year
- ☆12Mar 26, 2024Updated 2 years ago
- Efficient AI Inference & Serving☆480Jan 8, 2024Updated 2 years ago
- Efficient Auto-scalable Scientific Infrastructure for Engineers and Researchers☆14Sep 8, 2025Updated 9 months ago
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆223Aug 19, 2024Updated last year
- C++17 implementation of einops for libtorch - clear and reliable tensor manipulations with einstein-like notation☆12Oct 16, 2023Updated 2 years ago
- An open-source implementation of Regional Adaptive Sampling (RAS), a novel diffusion model sampling strategy that introduces regional var…☆152Apr 10, 2026Updated 2 months ago
- Torch Distributed Experimental☆117Aug 5, 2024Updated last year
- a simple API to use CUPTI☆10Aug 19, 2025Updated 9 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆62Updated this week
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆126Dec 25, 2025Updated 5 months ago
- RPCNIC: A High-Performance and Reconfigurable PCIe-attached RPC Accelerator [HPCA2025]☆15Dec 9, 2024Updated last year
- [PACT'24] GraNNDis. A fast and unified distributed graph neural network (GNN) training framework for both full-batch (full-graph) and min…☆10Aug 13, 2024Updated last year
- Fast low-bit matmul kernels in Triton☆467May 15, 2026Updated 3 weeks ago
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆44Nov 4, 2022Updated 3 years ago
- This is a clone of an SVN repository at http://pagecache-mangagement.googlecode.com/svn/trunk. It had been cloned by http://svn2github.co…☆11May 23, 2013Updated 13 years ago