facebookresearch / loop_nestLinks
Loop Nest - Linear algebra compiler and code generator.
☆21Updated 3 years ago
Alternatives and similar repositories for loop_nest
Users that are interested in loop_nest are comparing it to the libraries listed below
Sorting:
- FlexAttention w/ FlashAttention3 Support☆27Updated last year
- Experimental scripts for researching data adaptive learning rate scheduling.☆22Updated 2 years ago
- Customized matrix multiplication kernels☆57Updated 3 years ago
- ☆19Updated 3 years ago
- Udacity CS344 Introduction to Parallell Programming (https://classroom.udacity.com/courses/cs344), with assignments/materials updated to …☆46Updated 4 years ago
- Code and data for paper "(How) do Language Models Track State?"☆20Updated 7 months ago
- ☆53Updated last year
- Texture mapping with variational auto-encoders☆40Updated 4 years ago
- Multi-framework implementation of Deep Kernel Shaping and Tailored Activation Transformations, which are methods that modify neural netwo…☆74Updated 4 months ago
- Source-to-Source Debuggable Derivatives in Pure Python☆15Updated last year
- A LinearOperator implementation for PyTorch☆18Updated 4 years ago
- Hacks for PyTorch☆19Updated 2 years ago
- Better bindings for Python☆19Updated 2 years ago
- code associated with paper "Sparse Bayesian Optimization"☆26Updated 2 years ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆124Updated 2 months ago
- NumPy+Jax with named axes and an uncompromising attitude☆23Updated 8 months ago
- ☆16Updated last year
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆68Updated 7 months ago
- A tracing JIT compiler for PyTorch☆13Updated 3 years ago
- cuASR: CUDA Algebra for Semirings☆42Updated 3 years ago
- Example python package with pybind11 cpp extension☆57Updated 4 years ago
- Solver for Unconstrained Binary Quadratic Optimization (UBQO, BQO, QUBO) and Max 2-SAT, based on semidefinite relaxation with constraint …☆15Updated 2 years ago
- code for paper "Accessing higher dimensions for unsupervised word translation"☆21Updated 2 years ago
- A PyTorch Dataset that caches samples in shared memory, accessible globally to all processes☆22Updated 3 years ago
- ☆74Updated 2 years ago
- Supplementary code for the paper "Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces"☆44Updated 2 years ago
- [TMLR 2022] Curvature access through the generalized Gauss-Newton's low-rank structure: Eigenvalues, eigenvectors, directional derivative…☆17Updated 2 years ago
- A thin, highly portable toolkit for efficiently compiling dense loop-based computation.☆149Updated 2 years ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated last year
- Some CUDA design patterns and a bit of template magic for CUDA☆156Updated 2 years ago