facebookresearch / loop_nestLinks
Loop Nest - Linear algebra compiler and code generator.
☆22Updated 2 years ago
Alternatives and similar repositories for loop_nest
Users that are interested in loop_nest are comparing it to the libraries listed below
Sorting:
- FlexAttention w/ FlashAttention3 Support☆27Updated 10 months ago
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated last year
- Code and data for paper "(How) do Language Models Track State?"☆16Updated 4 months ago
- Udacity CS344 Introduction to Parallell Programming (https://classroom.udacity.com/courses/cs344), with assignments/materials updated to …☆46Updated 3 years ago
- ☆19Updated 2 years ago
- ☆16Updated 10 months ago
- Customized matrix multiplication kernels☆56Updated 3 years ago
- code associated with paper "Sparse Bayesian Optimization"☆26Updated last year
- Better bindings for Python☆17Updated 2 years ago
- Source-to-Source Debuggable Derivatives in Pure Python☆15Updated last year
- Direct solver for sparse SPD matrices for nonlinear optimization. Implements supernodal Cholesky decomposition algorithm, and supports GP…☆91Updated 2 months ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆63Updated 3 months ago
- Make triton easier☆47Updated last year
- Texture mapping with variational auto-encoders☆40Updated 3 years ago
- SParse AcceleRation on Tensor Architecture☆17Updated 4 months ago
- Some CUDA design patterns and a bit of template magic for CUDA☆156Updated 2 years ago
- ☆52Updated last year
- cuASR: CUDA Algebra for Semirings☆36Updated 2 years ago
- Experiment of using Tangent to autodiff triton☆80Updated last year
- NumPy+Jax with named axes and an uncompromising attitude☆21Updated 5 months ago
- A place to store reusable transformer components of my own creation or found on the interwebs☆59Updated last week
- Python package for Geometric / Clifford Algebra with Pytorch.☆13Updated 3 months ago
- Exploration into the Firefly algorithm in Pytorch☆40Updated 5 months ago
- A thin, highly portable toolkit for efficiently compiling dense loop-based computation.☆148Updated 2 years ago
- Quantize transformers to any learned arbitrary 4-bit numeric format☆39Updated last month
- code for paper "Accessing higher dimensions for unsupervised word translation"☆21Updated 2 years ago
- Example python package with pybind11 cpp extension☆57Updated 4 years ago
- ☆22Updated last year
- Pipeline parallelism for the minimalist☆18Updated this week
- A tracing JIT compiler for PyTorch☆13Updated 3 years ago