NVIDIA / free-threaded-python
No-GIL Python environment featuring NVIDIA Deep Learning libraries.
☆45Updated 2 weeks ago
Alternatives and similar repositories for free-threaded-python:
Users that are interested in free-threaded-python are comparing it to the libraries listed below
- A tracing JIT for PyTorch☆17Updated 2 years ago
- FlexAttention w/ FlashAttention3 Support☆26Updated 5 months ago
- TORCH_LOGS parser for PT2☆35Updated this week
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing. By pro…☆68Updated this week
- Hacks for PyTorch☆18Updated last year
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆64Updated 3 years ago
- ☆21Updated 2 weeks ago
- ☆12Updated 3 years ago
- ☆62Updated 3 weeks ago
- Benchmark tests supporting the TiledCUDA library.☆15Updated 4 months ago
- Collection of scripts to build PyTorch and the domain libraries from source.☆10Updated this week
- Make triton easier☆47Updated 9 months ago
- ☆49Updated last year
- extensible collectives library in triton☆84Updated 5 months ago
- Loop Nest - Linear algebra compiler and code generator.☆22Updated 2 years ago
- ☆11Updated 3 years ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆104Updated this week
- PyTorch centric eager mode debugger☆46Updated 3 months ago
- TensorRT LLM Benchmark Configuration☆13Updated 7 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆63Updated this week
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆23Updated last month
- ONNX Command-Line Toolbox☆35Updated 5 months ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆56Updated last month
- ☆26Updated this week
- Experiment of using Tangent to autodiff triton☆78Updated last year
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆39Updated this week
- Benchmarks to capture important workloads.☆30Updated last month
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆18Updated this week