NVIDIA / free-threaded-python
No-GIL Python environment featuring NVIDIA Deep Learning libraries.
☆20Updated last week
Related projects ⓘ
Alternatives and complementary repositories for free-threaded-python
- FlexAttention w/ FlashAttention3 Support☆27Updated last month
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆20Updated this week
- Loop Nest - Linear algebra compiler and code generator.☆22Updated 2 years ago
- Memory Optimizations for Deep Learning (ICML 2023)☆59Updated 7 months ago
- ☆24Updated last week
- ☆14Updated last month
- TensorRT LLM Benchmark Configuration☆11Updated 3 months ago
- Hacks for PyTorch☆17Updated last year
- Experiment of using Tangent to autodiff triton☆72Updated 9 months ago
- ☆17Updated 2 weeks ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆43Updated last year
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated last year
- MLPerf™ logging library☆30Updated last week
- Benchmarking PyTorch 2.0 different models☆21Updated last year
- [WIP] Context parallel attention that works with torch.compile☆20Updated this week
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆35Updated 3 months ago
- ☆42Updated 11 months ago
- benchmarking some transformer deployments☆26Updated last year
- Computing the greatest common divisor with transformers, source code for the paper https//arxiv.org/abs/2308.15594☆12Updated 7 months ago
- extensible collectives library in triton☆65Updated last month
- CUDA 12.2 HMM demos☆17Updated 3 months ago
- Code for Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture (accepted by PVLDB).The outdated wr…☆8Updated last year
- Explore training for quantized models☆10Updated this week
- Source-to-Source Debuggable Derivatives in Pure Python☆14Updated 9 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆16Updated this week
- ☆48Updated 3 months ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆98Updated last month
- ☆20Updated last year
- End to End steps for adding custom ops in PyTorch.☆19Updated 4 years ago