End to End steps for adding custom ops in PyTorch.
☆24Aug 20, 2020Updated 5 years ago
Alternatives and similar repositories for pytorch_custom_op
Users that are interested in pytorch_custom_op are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A intelligent matrix format designer for SpMV☆10Oct 10, 2023Updated 2 years ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- Marek's approach to building AMD GPU drivers for driver development☆27Oct 13, 2025Updated 5 months ago
- CAKE Library for constant-bandwidth matrix multiplication on CPUs☆14Apr 6, 2024Updated last year
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Apr 2, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of…☆197Feb 27, 2026Updated last month
- My tests and experiments with some popular dl frameworks.☆17Sep 11, 2025Updated 6 months ago
- Example to build PyTorch CUDA extension using CMake (with pybind11 and scikit-build)☆12May 26, 2020Updated 5 years ago
- ☆23Feb 16, 2022Updated 4 years ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆13Nov 23, 2024Updated last year
- Open-source repository for paper "LogGrep: Fast and Cheap Cloud Log Storage by Exploiting both Static and Runtime Patterns"(ACM Eurosys 2…☆26Sep 12, 2023Updated 2 years ago
- Public proposals, extensions, information and materials from the SYCL working group☆15Jan 26, 2024Updated 2 years ago
- Source code for the paper "LongGenBench: Long-context Generation Benchmark"☆23Oct 8, 2024Updated last year
- DeeperGEMM: crazy optimized version☆75May 5, 2025Updated 10 months ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Utility for OpenAI GPT Functions☆14Jun 25, 2023Updated 2 years ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Feb 17, 2025Updated last year
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19May 12, 2024Updated last year
- A demo of Redis Enterprise as the Online Feature Store deployed on GCP with Feast and NVIDIA Triton Inference Server.☆15May 9, 2023Updated 2 years ago
- Noisy language compiler☆17Jul 31, 2024Updated last year
- High-speed Bloom filters and taffy filters for C, C++, and Java☆35Aug 9, 2023Updated 2 years ago
- A selective knowledge distillation algorithm for efficient speculative decoders☆36Nov 27, 2025Updated 4 months ago
- A GPU FP32 computation method with Tensor Cores.☆26Dec 8, 2025Updated 3 months ago
- Yinghan's Code Sample☆366Jul 25, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Automatic virtualization of (general) accelerators.☆47Nov 28, 2022Updated 3 years ago
- Pytorch implementations of the BNN, XNOR-Net and BiReal-Net☆15Aug 20, 2020Updated 5 years ago
- Benchmark tools for LightGBM☆15Jul 28, 2023Updated 2 years ago
- 🐆 A compiler from AI model to RTL (Verilog) accelerator in FPGA hardware with auto design space exploration for *AdderNet*☆21May 27, 2024Updated last year
- Linux io_uring based c++ 20 coroutine library☆28Jun 21, 2022Updated 3 years ago
- image to column☆30Jul 15, 2014Updated 11 years ago
- ☆20Feb 12, 2025Updated last year
- CARMA Streets is a component of CARMA ecosystem, which enables such a coordination among different transportation users. This component p…☆11Mar 10, 2026Updated 2 weeks ago
- ☆19Dec 24, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆22Feb 18, 2025Updated last year
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Mar 24, 2025Updated last year
- QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning☆171Nov 11, 2025Updated 4 months ago
- Official code of paper "MICSim: A Modular Simulator for Mixed-signal Compute-in-Memory based AI Accelerator", ASP-DAC 2025☆36Oct 15, 2025Updated 5 months ago
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆25Feb 21, 2025Updated last year
- CUDA实现'huawei-noah/AdderNet'的forward和backward☆18Apr 16, 2020Updated 5 years ago
- SocksDirect code repository☆19Jun 26, 2022Updated 3 years ago