project-codeflare / zero-copy-model-loading
In-depth code associated with my Medium blog post, "How to Load PyTorch Models 340 Times Faster with Ray"
☆24Updated 2 years ago
Related projects: ⓘ
- Simple dependency injection framework for Python☆21Updated 4 months ago
- Productionize machine learning predictions, with ONNX or without☆66Updated 8 months ago
- Module, Model, and Tensor Serialization/Deserialization☆175Updated 3 weeks ago
- PyTorch centric eager mode debugger☆43Updated 2 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆34Updated 2 months ago
- Utilities for Training Very Large Models☆56Updated last week
- A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-…☆66Updated last year
- Home for OctoML PyTorch Profiler☆105Updated last year
- Torch Distributed Experimental☆115Updated last month
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆145Updated this week
- The Triton backend for the PyTorch TorchScript models.☆117Updated last week
- ☆12Updated last year
- hyperparameter management☆43Updated last year
- benchmarking some transformer deployments☆26Updated last year
- MLFlow Deployment Plugin for Ray Serve☆41Updated 2 years ago
- Pygloo provides Python bindings for Gloo.☆16Updated 4 months ago
- Notes and artifacts from the ONNX steering committee☆24Updated this week
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆144Updated this week
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆173Updated 3 months ago
- Context Manager to profile the forward and backward times of PyTorch's nn.Module☆83Updated 11 months ago
- WIP. Veloce is a low-code Ray-based parallelization library that makes machine learning computation novel, efficient, and heterogeneous.☆18Updated 2 years ago
- ☆65Updated 2 months ago
- Plugin for deploying MLflow models to TorchServe☆106Updated last year
- A top-like tool for monitoring GPUs in a cluster☆80Updated 7 months ago
- Distributed ML Optimizer☆31Updated 3 years ago
- API serving for your diffusers models☆10Updated 8 months ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆94Updated this week
- Simple and fast low-bit matmul kernels in CUDA☆48Updated this week
- A client library in Rust for Nvidia Triton.☆23Updated last year
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆25Updated 3 weeks ago