facebookresearch / loop_nestLinks
Loop Nest - Linear algebra compiler and code generator.
☆21Updated 3 years ago
Alternatives and similar repositories for loop_nest
Users that are interested in loop_nest are comparing it to the libraries listed below
Sorting:
- Udacity CS344 Introduction to Parallell Programming (https://classroom.udacity.com/courses/cs344), with assignments/materials updated to …☆46Updated 4 years ago
- ☆19Updated 3 years ago
- FlexAttention w/ FlashAttention3 Support☆27Updated last year
- Customized matrix multiplication kernels☆57Updated 3 years ago
- Experimental scripts for researching data adaptive learning rate scheduling.☆22Updated 2 years ago
- ☆16Updated last year
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆70Updated 9 months ago
- PyTorch interface for the IPU☆181Updated 2 years ago
- A tracing JIT compiler for PyTorch☆13Updated 4 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆158Updated 2 years ago
- Direct solver for sparse SPD matrices for nonlinear optimization. Implements supernodal Cholesky decomposition algorithm, and supports GP…☆97Updated 4 months ago
- code associated with paper "Sparse Bayesian Optimization"☆26Updated 2 years ago
- ☆55Updated last year
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆130Updated last week
- A LinearOperator implementation for PyTorch☆18Updated 5 years ago
- Authors implementation of LieTransformer: Equivariant Self-Attention for Lie Groups☆36Updated 5 years ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆46Updated last year
- Texture mapping with variational auto-encoders☆40Updated 4 years ago
- cuASR: CUDA Algebra for Semirings☆44Updated 3 years ago
- Hacks for PyTorch☆19Updated 2 years ago
- code for paper "Accessing higher dimensions for unsupervised word translation"☆22Updated 2 years ago
- SParse AcceleRation on Tensor Architecture☆18Updated 10 months ago
- A thin, highly portable toolkit for efficiently compiling dense loop-based computation.☆149Updated 3 years ago
- benchmarking some transformer deployments☆26Updated last month
- A place to store reusable transformer components of my own creation or found on the interwebs☆72Updated 3 weeks ago
- Code and data for paper "(How) do Language Models Track State?"☆21Updated 10 months ago
- Abstractions of memory, allocator, vector, tuple, shared_ptr, unique_ptr, bitset, variant and string working on both CPU and GPU☆31Updated 5 months ago
- Better bindings for Python☆19Updated 3 years ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆182Updated last month
- Solver for Unconstrained Binary Quadratic Optimization (UBQO, BQO, QUBO) and Max 2-SAT, based on semidefinite relaxation with constraint …☆15Updated 2 years ago