NVIDIA / free-threaded-python
No-GIL Python environment featuring NVIDIA Deep Learning libraries.
☆22Updated this week
Related projects ⓘ
Alternatives and complementary repositories for free-threaded-python
- FlexAttention w/ FlashAttention3 Support☆27Updated last month
- ☆32Updated this week
- Experiment of using Tangent to autodiff triton☆72Updated 9 months ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆43Updated last year
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆63Updated 2 years ago
- TensorRT LLM Benchmark Configuration☆11Updated 3 months ago
- ☆17Updated 3 weeks ago
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated last year
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆20Updated last week
- Memory Optimizations for Deep Learning (ICML 2023)☆60Updated 8 months ago
- Benchmarking PyTorch 2.0 different models☆21Updated last year
- ☆29Updated 5 months ago
- ☆20Updated last year
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆35Updated 4 months ago
- MLPerf™ logging library☆30Updated this week
- Implementation of Hyena Hierarchy in JAX☆10Updated last year
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆46Updated 2 months ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- ☆55Updated 5 months ago
- Hacks for PyTorch☆17Updated last year
- ☆14Updated last month
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆15Updated this week
- Make triton easier☆41Updated 5 months ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆98Updated last week
- Personal solutions to the Triton Puzzles☆16Updated 4 months ago
- pytorch-profiler☆50Updated last year
- extensible collectives library in triton☆72Updated last month
- Odysseus: Playground of LLM Sequence Parallelism☆57Updated 5 months ago
- ☆12Updated last month