End to End steps for adding custom ops in PyTorch.
☆24Aug 20, 2020Updated 5 years ago
Alternatives and similar repositories for pytorch_custom_op
Users that are interested in pytorch_custom_op are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A pseudo random number generator library written against the SYCL API.☆11Jun 11, 2019Updated 7 years ago
- A intelligent matrix format designer for SpMV☆10Oct 10, 2023Updated 2 years ago
- Benchmark tests supporting the TiledCUDA library.☆19Nov 19, 2024Updated last year
- Marek's approach to building AMD GPU drivers for driver development☆28Jun 1, 2026Updated 2 weeks ago
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Apr 2, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A warp-oriented dynamic hash table for GPUs☆76Jan 19, 2024Updated 2 years ago
- HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of…☆206May 22, 2026Updated 3 weeks ago
- My tests and experiments with some popular dl frameworks.☆17Sep 11, 2025Updated 9 months ago
- ☆23Feb 16, 2022Updated 4 years ago
- An Easy To Use PyTorch Computer Vision Library☆53Jul 6, 2023Updated 2 years ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆13Nov 23, 2024Updated last year
- Open-source repository for paper "LogGrep: Fast and Cheap Cloud Log Storage by Exploiting both Static and Runtime Patterns"(ACM Eurosys 2…☆27Sep 12, 2023Updated 2 years ago
- clp-ffi-py is a Python library to encode log messages with CLP, and work with the encoded messages using a foreign function interface (FF…☆12Oct 21, 2025Updated 7 months ago
- A set of tools to extract library signature of binary programs at runtime.☆28Jan 21, 2026Updated 4 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Public proposals, extensions, information and materials from the SYCL working group☆15Jan 26, 2024Updated 2 years ago
- Blazingly fast neighborhood attention☆14Nov 28, 2023Updated 2 years ago
- Source code for the paper "LongGenBench: Long-context Generation Benchmark"☆23Oct 8, 2024Updated last year
- Hardware go brrr bounded context suffix array construction algorithm☆19Nov 1, 2023Updated 2 years ago
- Utility for OpenAI GPT Functions☆14Jun 25, 2023Updated 2 years ago
- DeeperGEMM: crazy optimized version☆86May 5, 2025Updated last year
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19May 12, 2024Updated 2 years ago
- A demo of Redis Enterprise as the Online Feature Store deployed on GCP with Feast and NVIDIA Triton Inference Server.☆15May 9, 2023Updated 3 years ago
- High-speed Bloom filters and taffy filters for C, C++, and Java☆35Aug 9, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Noisy language compiler☆17Jul 31, 2024Updated last year
- Commands that will make you more comfortable with the ROCm toolkit.☆18Aug 1, 2024Updated last year
- Emulating DMA Engines on GPUs for Performance and Portability☆43May 17, 2015Updated 11 years ago
- Examples for MS-AMP package.☆30Jul 17, 2025Updated 10 months ago
- A GPU FP32 computation method with Tensor Cores.☆27Dec 8, 2025Updated 6 months ago
- A selective knowledge distillation algorithm for efficient speculative decoders☆40Nov 27, 2025Updated 6 months ago
- A study for a custom convolution layer in which the x and y components of an image pixel are added to the kernel inputs.☆12Feb 21, 2020Updated 6 years ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Feb 17, 2025Updated last year
- Yinghan's Code Sample☆364Jul 25, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Automatic virtualization of (general) accelerators.☆47Nov 28, 2022Updated 3 years ago
- Site to match up developers with people willing to comment on their pull requests☆30Dec 14, 2022Updated 3 years ago
- PromQL parser for Python☆28Jun 8, 2026Updated last week
- Pytorch implementations of the BNN, XNOR-Net and BiReal-Net☆15Aug 20, 2020Updated 5 years ago
- Benchmark tools for LightGBM☆15Jul 28, 2023Updated 2 years ago
- 🐆 A compiler from AI model to RTL (Verilog) accelerator in FPGA hardware with auto design space exploration for *AdderNet*☆22May 27, 2024Updated 2 years ago
- image to column☆30Jul 15, 2014Updated 11 years ago