pytorch / test-infraLinks
This repository hosts code that supports the testing infrastructure for the PyTorch organization. For example, this repo hosts the logic to track disabled tests and slow tests, as well as our continuation integration jobs HUD/dashboard.
☆103Updated this week
Alternatives and similar repositories for test-infra
Users that are interested in test-infra are comparing it to the libraries listed below
Sorting:
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆162Updated 2 months ago
- PyTorch RFCs (experimental)☆136Updated 6 months ago
- A library to analyze PyTorch traces.☆448Updated 3 weeks ago
- TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…☆406Updated last week
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆320Updated this week
- Home for OctoML PyTorch Profiler☆114Updated 2 years ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆365Updated this week
- ☆185Updated last year
- ☆148Updated last month
- Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools☆71Updated this week
- ☆338Updated last week
- A tensor-aware point-to-point communication primitive for machine learning☆279Updated last month
- Torch Distributed Experimental☆117Updated last year
- TorchFix - a linter for PyTorch-using code with autofix support☆152Updated 3 months ago
- This repository contains the experimental PyTorch native float8 training UX☆227Updated last year
- MLPerf™ logging library☆37Updated last week
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆239Updated this week
- Provide Python access to the NVML library for GPU diagnostics☆253Updated 3 months ago
- ☆252Updated last year
- An experimental implementation of compiler-driven automatic sharding of models across a given device mesh.☆47Updated this week
- A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.☆901Updated this week
- TORCH_LOGS parser for PT2☆69Updated last month
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆456Updated last week
- extensible collectives library in triton☆91Updated 8 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆216Updated this week
- jax-triton contains integrations between JAX and OpenAI Triton☆436Updated this week
- oneCCL Bindings for Pytorch* (deprecated)☆103Updated last month
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆679Updated this week
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆60Updated 3 weeks ago
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆201Updated last week