End to End steps for adding custom ops in PyTorch.
☆24Aug 20, 2020Updated 5 years ago
Alternatives and similar repositories for pytorch_custom_op
Users that are interested in pytorch_custom_op are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation for "ROLL: Visual Self-Supervised Reinforcement Learning with Object Reasoning", CoRL 2020☆16Jun 22, 2022Updated 3 years ago
- AevaScenes Python SDK☆49Nov 6, 2025Updated 6 months ago
- A intelligent matrix format designer for SpMV☆10Oct 10, 2023Updated 2 years ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- CAKE Library for constant-bandwidth matrix multiplication on CPUs☆14Apr 6, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Apr 2, 2025Updated last year
- A warp-oriented dynamic hash table for GPUs☆76Jan 19, 2024Updated 2 years ago
- HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of…☆202Apr 12, 2026Updated 3 weeks ago
- My tests and experiments with some popular dl frameworks.☆17Sep 11, 2025Updated 7 months ago
- ☆23Feb 16, 2022Updated 4 years ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆13Nov 23, 2024Updated last year
- A set of tools to extract library signature of binary programs at runtime.☆28Jan 21, 2026Updated 3 months ago
- Public proposals, extensions, information and materials from the SYCL working group☆15Jan 26, 2024Updated 2 years ago
- DeeperGEMM: crazy optimized version☆86May 5, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19May 12, 2024Updated last year
- Noisy language compiler☆17Jul 31, 2024Updated last year
- Examples for MS-AMP package.☆30Jul 17, 2025Updated 9 months ago
- A GPU FP32 computation method with Tensor Cores.☆26Dec 8, 2025Updated 4 months ago
- A selective knowledge distillation algorithm for efficient speculative decoders☆39Nov 27, 2025Updated 5 months ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Feb 17, 2025Updated last year
- A study for a custom convolution layer in which the x and y components of an image pixel are added to the kernel inputs.☆12Feb 21, 2020Updated 6 years ago
- Yinghan's Code Sample☆364Jul 25, 2022Updated 3 years ago
- Automatic virtualization of (general) accelerators.☆47Nov 28, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Pytorch implementations of the BNN, XNOR-Net and BiReal-Net☆15Aug 20, 2020Updated 5 years ago
- Linux io_uring based c++ 20 coroutine library☆28Jun 21, 2022Updated 3 years ago
- image to column☆30Jul 15, 2014Updated 11 years ago
- ☆20Feb 12, 2025Updated last year
- ☆30Apr 28, 2026Updated last week
- ☆20Dec 24, 2024Updated last year
- ParaS is an implementation to support SYCL☆24Dec 10, 2025Updated 4 months ago
- ☆33Jul 19, 2024Updated last year
- ☆23Feb 18, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Mar 24, 2025Updated last year
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆179Nov 11, 2025Updated 5 months ago
- CUDA实现'huawei-noah/AdderNet'的forward和backward☆17Apr 16, 2020Updated 6 years ago
- UCSD CSE 237D Spring '20 Course Project☆20Sep 4, 2023Updated 2 years ago
- This is a BNN_Kernel on PyTorch for 1-bit networks in image data processing☆23Sep 28, 2019Updated 6 years ago
- Transformers components but in Triton☆34May 9, 2025Updated 11 months ago
- Highly Efficient FFT for Exascale☆39Apr 29, 2024Updated 2 years ago