End to End steps for adding custom ops in PyTorch.
☆24Aug 20, 2020Updated 5 years ago
Alternatives and similar repositories for pytorch_custom_op
Users that are interested in pytorch_custom_op are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2025 Oral] The official implementation of "Diffusion-Based Planning for Autonomous Driving with Flexible Guidance"☆35Updated this week
- Implementation for "ROLL: Visual Self-Supervised Reinforcement Learning with Object Reasoning", CoRL 2020☆16Jun 22, 2022Updated 3 years ago
- A intelligent matrix format designer for SpMV☆10Oct 10, 2023Updated 2 years ago
- Benchmark tests supporting the TiledCUDA library.☆19Nov 19, 2024Updated last year
- Marek's approach to building AMD GPU drivers for driver development☆28May 17, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- CAKE Library for constant-bandwidth matrix multiplication on CPUs☆14Apr 6, 2024Updated 2 years ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Apr 2, 2025Updated last year
- A warp-oriented dynamic hash table for GPUs☆76Jan 19, 2024Updated 2 years ago
- The autoware diffusion planner package☆33Jul 24, 2025Updated 10 months ago
- HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of…☆202May 17, 2026Updated last week
- My tests and experiments with some popular dl frameworks.☆17Sep 11, 2025Updated 8 months ago
- Example to build PyTorch CUDA extension using CMake (with pybind11 and scikit-build)☆12May 26, 2020Updated 6 years ago
- ☆23Feb 16, 2022Updated 4 years ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆13Nov 23, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Free TPU OS running on the FPGA☆10May 6, 2023Updated 3 years ago
- Source code for the paper "LongGenBench: Long-context Generation Benchmark"☆22Oct 8, 2024Updated last year
- DeeperGEMM: crazy optimized version☆86May 5, 2025Updated last year
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19May 12, 2024Updated 2 years ago
- Noisy language compiler☆17Jul 31, 2024Updated last year
- Commands that will make you more comfortable with the ROCm toolkit.☆18Aug 1, 2024Updated last year
- Emulating DMA Engines on GPUs for Performance and Portability☆43May 17, 2015Updated 11 years ago
- Examples for MS-AMP package.☆30Jul 17, 2025Updated 10 months ago
- A GPU FP32 computation method with Tensor Cores.☆27Dec 8, 2025Updated 5 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- A selective knowledge distillation algorithm for efficient speculative decoders☆39Nov 27, 2025Updated 5 months ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Feb 17, 2025Updated last year
- A study for a custom convolution layer in which the x and y components of an image pixel are added to the kernel inputs.☆12Feb 21, 2020Updated 6 years ago
- Yinghan's Code Sample☆364Jul 25, 2022Updated 3 years ago
- Automatic virtualization of (general) accelerators.☆47Nov 28, 2022Updated 3 years ago
- PromQL parser for Python☆28May 12, 2026Updated 2 weeks ago
- 🐆 A compiler from AI model to RTL (Verilog) accelerator in FPGA hardware with auto design space exploration for *AdderNet*☆22May 27, 2024Updated 2 years ago
- TVMScript kernel for deformable attention☆25Dec 15, 2021Updated 4 years ago
- image to column☆30Jul 15, 2014Updated 11 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- CARMA Streets is a component of CARMA ecosystem, which enables such a coordination among different transportation users. This component p…☆11May 14, 2026Updated last week
- ☆30Updated this week
- Setting up Vscode to work with Pytorch in C/C++ with CUDA support☆25Feb 5, 2025Updated last year
- ☆20Dec 24, 2024Updated last year
- ParaS is an implementation to support SYCL☆25Dec 10, 2025Updated 5 months ago
- ☆23Feb 18, 2025Updated last year
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Mar 24, 2025Updated last year