sandeepkumar-skb / pytorch_custom_opView external linksLinks
End to End steps for adding custom ops in PyTorch.
☆24Aug 20, 2020Updated 5 years ago
Alternatives and similar repositories for pytorch_custom_op
Users that are interested in pytorch_custom_op are comparing it to the libraries listed below
Sorting:
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Apr 2, 2025Updated 10 months ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- My tests and experiments with some popular dl frameworks.☆17Sep 11, 2025Updated 5 months ago
- Noisy language compiler☆17Jul 31, 2024Updated last year
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆14Nov 23, 2024Updated last year
- CAKE Library for constant-bandwidth matrix multiplication on CPUs☆14Apr 6, 2024Updated last year
- Automatic virtualization of (general) accelerators.☆46Nov 28, 2022Updated 3 years ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19May 12, 2024Updated last year
- image to column☆30Jul 15, 2014Updated 11 years ago
- A GPU FP32 computation method with Tensor Cores.☆26Dec 8, 2025Updated 2 months ago
- Source code for the paper "LongGenBench: Long-context Generation Benchmark"☆24Oct 8, 2024Updated last year
- ☆22Feb 18, 2025Updated 11 months ago
- ngAP's artifact for ASPLOS'24☆25Jul 29, 2025Updated 6 months ago
- Marek's approach to building AMD GPU drivers for driver development☆27Oct 13, 2025Updated 4 months ago
- ☆41Nov 1, 2025Updated 3 months ago
- Open-source repository for paper "LogGrep: Fast and Cheap Cloud Log Storage by Exploiting both Static and Runtime Patterns"(ACM Eurosys 2…☆26Sep 12, 2023Updated 2 years ago
- HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of…☆192Feb 8, 2026Updated last week
- DeeperGEMM: crazy optimized version☆73May 5, 2025Updated 9 months ago
- Examples for MS-AMP package.☆30Jul 17, 2025Updated 6 months ago
- Linux io_uring based c++ 20 coroutine library☆28Jun 21, 2022Updated 3 years ago
- Repository holding the code base to AC-SpGEMM : "Adaptive Sparse Matrix-Matrix Multiplication on the GPU"☆31Jul 7, 2020Updated 5 years ago
- SimplePIM is the first high-level programming framework for real-world processing-in-memory (PIM) architectures. Described in the PACT 20…☆31Oct 23, 2023Updated 2 years ago
- Transformers components but in Triton☆34May 9, 2025Updated 9 months ago
- Hop-Wise Graph Attention for Scalable and Generalizable Learning on Circuits☆35Aug 25, 2024Updated last year
- cuASR: CUDA Algebra for Semirings☆44Aug 22, 2022Updated 3 years ago
- The GitHub repository for the paper "Denoising Application of Magnetotelluric Low-Frequency Signal Processing"☆11Feb 22, 2023Updated 2 years ago
- ETHZ Heterogeneous Accelerated Compute Cluster.☆38Oct 7, 2025Updated 4 months ago
- Kinematic and dynamic models of continuum and articulated soft robots.☆15Nov 22, 2025Updated 2 months ago
- Attention in SRAM on Tenstorrent Grayskull☆40Jul 18, 2024Updated last year
- PTX-EMU is a simple emulator for CUDA program.☆37Apr 25, 2025Updated 9 months ago
- TLB Benchmarks☆35Sep 11, 2017Updated 8 years ago
- Fork of upstream onnxruntime focused on supporting risc-v accelerators☆88Mar 26, 2023Updated 2 years ago
- lab solutions of ICS course☆10Jan 20, 2013Updated 13 years ago
- An artificial matrix generator in C☆12Feb 16, 2023Updated 2 years ago
- ☆52Nov 5, 2024Updated last year
- A novell, highly-optimized CUDA implementation of k-means algorithm.☆41Mar 3, 2022Updated 3 years ago
- ☆40Feb 28, 2020Updated 5 years ago
- BERT Sentiment Classification on the IMDb Large Movie Review Dataset.☆16Sep 8, 2022Updated 3 years ago
- MATLAB function to fill an area with hatching ~~or speckling~~☆11Mar 4, 2018Updated 7 years ago