ahennequ / cuda-tensorcores-register-mappingLinks
☆19Updated 3 years ago
Alternatives and similar repositories for cuda-tensorcores-register-mapping
Users that are interested in cuda-tensorcores-register-mapping are comparing it to the libraries listed below
Sorting:
- Udacity CS344 Introduction to Parallell Programming (https://classroom.udacity.com/courses/cs344), with assignments/materials updated to …☆46Updated 4 years ago
- Customized matrix multiplication kernels☆57Updated 3 years ago
- Hacks for PyTorch☆19Updated 2 years ago
- ☆34Updated 7 months ago
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆54Updated 2 months ago
- ONNX Command-Line Toolbox☆35Updated last year
- Guide on how to convert custom PyTorch layers when using ONNX.☆22Updated 7 years ago
- Texture mapping with variational auto-encoders☆40Updated 4 years ago
- ☆160Updated 2 years ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆46Updated last year
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆182Updated last month
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- TVMScript kernel for deformable attention☆25Updated 4 years ago
- Experimental scripts for researching data adaptive learning rate scheduling.☆22Updated 2 years ago
- Some CUDA design patterns and a bit of template magic for CUDA☆158Updated 2 years ago
- A PyTorch Dataset that caches samples in shared memory, accessible globally to all processes☆23Updated 3 years ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated 2 years ago
- An open source implementation of CLIP.☆33Updated 3 years ago
- PyTorch interface for the IPU☆181Updated 2 years ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆46Updated 2 years ago
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆220Updated 2 years ago
- Little article showing how to load pytorch's models with linear memory consumption☆34Updated 3 years ago
- ☆23Updated 3 years ago
- Authors implementation of LieTransformer: Equivariant Self-Attention for Lie Groups☆36Updated 4 years ago
- Code for paper Rethinking the Data Annotation Process for Multi-view 3D Pose Estimation with Active Learning and Self-Training☆22Updated 2 years ago
- Torch Distributed Experimental☆117Updated last year
- DeltaCNN End-to-End CNN Inference of Sparse Frame Differences in Videos☆59Updated 2 years ago
- Example repository for custom C++/CUDA operators for TorchScript☆114Updated 3 years ago
- FlexAttention w/ FlashAttention3 Support☆27Updated last year
- ☆29Updated 3 years ago