facebookresearch / loop_nest
Loop Nest - Linear algebra compiler and code generator.
☆22Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for loop_nest
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆20Updated this week
- FlexAttention w/ FlashAttention3 Support☆26Updated last month
- ☆14Updated last month
- Better bindings for Python☆17Updated last year
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆46Updated 2 months ago
- ☆48Updated 3 months ago
- A tracing JIT compiler for PyTorch☆12Updated 2 years ago
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated last year
- Automatically insert nvtx ranges to PyTorch models☆17Updated 3 years ago
- ☆20Updated last year
- ☆18Updated 2 years ago
- TORCH_LOGS parser for PT2☆21Updated 3 weeks ago
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆20Updated 7 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆35Updated 3 months ago
- Customized matrix multiplication kernels☆53Updated 2 years ago
- benchmarking some transformer deployments☆26Updated last year
- ☆9Updated 3 years ago
- Explore training for quantized models☆10Updated 2 weeks ago
- A tracing JIT for PyTorch☆17Updated 2 years ago
- Implementation of a Tensorflow XLA rematerialization pass☆15Updated 4 years ago
- ☆12Updated 3 years ago
- Standalone commandline CLI tool for compiling Triton kernels☆15Updated last month
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆20Updated 4 years ago
- ☆23Updated 2 months ago
- ☆17Updated 2 weeks ago
- Awesome Triton Resources☆18Updated 3 weeks ago
- MLPerf™ Mobile models☆24Updated 3 weeks ago
- code for paper "Accessing higher dimensions for unsupervised word translation"☆21Updated last year
- Source-to-Source Debuggable Derivatives in Pure Python☆14Updated 9 months ago