NVIDIA / free-threaded-python
No-GIL Python environment featuring NVIDIA Deep Learning libraries.
☆43Updated last week
Alternatives and similar repositories for free-threaded-python:
Users that are interested in free-threaded-python are comparing it to the libraries listed below
- Hacks for PyTorch☆18Updated last year
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆64Updated 2 years ago
- A tracing JIT for PyTorch☆17Updated 2 years ago
- ☆21Updated 3 months ago
- FlexAttention w/ FlashAttention3 Support☆26Updated 4 months ago
- TORCH_LOGS parser for PT2☆32Updated this week
- Benchmark tests supporting the TiledCUDA library.☆15Updated 3 months ago
- Make triton easier☆44Updated 8 months ago
- ☆59Updated 2 weeks ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆89Updated this week
- TileFusion is a highly efficient kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.☆55Updated this week
- ☆48Updated 11 months ago
- ☆20Updated last year
- ☆11Updated 3 years ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆43Updated 7 months ago
- TensorRT LLM Benchmark Configuration☆13Updated 6 months ago
- ☆12Updated 3 years ago
- Framework to reduce autotune overhead to zero for well known deployments.☆61Updated 3 weeks ago
- extensible collectives library in triton☆83Updated 4 months ago
- ☆21Updated last week
- Experiment of using Tangent to autodiff triton☆75Updated last year
- Customized matrix multiplication kernels☆53Updated 2 years ago
- A Ray-based data loader with per-epoch shuffling and configurable pipelining, for shuffling and loading training data for distributed tra…☆18Updated 2 years ago
- A library for syntactically rewriting Python programs, pronounced (sinner).☆70Updated 2 years ago
- A tracing JIT compiler for PyTorch☆12Updated 3 years ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆107Updated 3 weeks ago
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆23Updated last month
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated 11 months ago