NVIDIA / free-threaded-python
No-GIL Python environment featuring NVIDIA Deep Learning libraries.
☆41Updated 2 months ago
Alternatives and similar repositories for free-threaded-python:
Users that are interested in free-threaded-python are comparing it to the libraries listed below
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆63Updated 2 years ago
- Hacks for PyTorch☆18Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Updated 3 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated 10 months ago
- ☆21Updated 3 months ago
- Collection of scripts to build PyTorch and the domain libraries from source.☆10Updated last week
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆43Updated 6 months ago
- ☆20Updated last year
- ☆58Updated 8 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆59Updated this week
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆86Updated this week
- TensorRT LLM Benchmark Configuration☆12Updated 6 months ago
- Benchmark tests supporting the TiledCUDA library.☆12Updated 2 months ago
- Code for Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture (accepted by PVLDB).The outdated wr…☆9Updated last year
- ☆21Updated this week
- A tracing JIT for PyTorch☆17Updated 2 years ago
- ☆48Updated 10 months ago
- extensible collectives library in triton☆77Updated 4 months ago
- TileFusion is a highly efficient kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.☆43Updated this week
- pytorch code examples for measuring the performance of collective communication calls in AI workloads☆13Updated 3 months ago
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆23Updated last month
- MLPerf™ logging library☆32Updated 3 weeks ago
- Customized matrix multiplication kernels☆53Updated 2 years ago
- Make triton easier☆44Updated 7 months ago
- TORCH_LOGS parser for PT2☆30Updated this week
- ☆11Updated 3 years ago
- Explore training for quantized models☆13Updated 3 weeks ago
- Learning about CUDA by writing PTX code.☆33Updated 11 months ago