☆36Aug 25, 2023Updated 2 years ago
Alternatives and similar repositories for conv2d_direct
Users that are interested in conv2d_direct are comparing it to the libraries listed below
Sorting:
- ☆20Aug 20, 2025Updated 6 months ago
- Fast GPU based tensor core reductions☆13Jan 13, 2023Updated 3 years ago
- A CUDA kernel for NHWC GroupNorm for PyTorch☆23Nov 15, 2024Updated last year
- SeekFree RT1064 Library GCC(VSCode) Porting☆12Oct 8, 2021Updated 4 years ago
- A set of examples around MegEngine☆31Dec 8, 2023Updated 2 years ago
- ☆52Jan 5, 2026Updated 2 months ago
- From Minimal GEMM to Everything☆163Feb 10, 2026Updated 3 weeks ago
- Implement custom operators in PyTorch with cuda/c++☆77Jan 1, 2023Updated 3 years ago
- CUDA GPU implementation of GMRES iterative Solver☆10Apr 16, 2012Updated 13 years ago
- ☆39Apr 9, 2024Updated last year
- CenterPoint model trained with MMDetection3d on custom dataset, and deployed with TensorRT☆35Mar 15, 2023Updated 2 years ago
- ☆23Feb 26, 2026Updated last week
- ☆33Dec 10, 2025Updated 2 months ago
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 5 months ago
- LLM-DSE: Searching Accelerator Parameters with LLM Agents☆13May 22, 2025Updated 9 months ago
- ☆18Feb 13, 2026Updated 3 weeks ago
- 吴恩达深度学习课程课后作业☆10Jan 28, 2020Updated 6 years ago
- 跟着Tensorrt_pro学习各种知识☆40Nov 25, 2022Updated 3 years ago
- nscscc2024,HPU河南理工大学参赛作品,两仪处理器☆11Aug 24, 2024Updated last year
- ☆14Nov 10, 2019Updated 6 years ago
- ☆12Sep 18, 2024Updated last year
- RISCV CPU implementation tutorial steps for Cologne Chip Gatemate E1, adopted from https://github.com/BrunoLevy/learn-fpga☆15Updated this week
- Minitorch Self-Study Guide (SAIA)☆11Oct 9, 2022Updated 3 years ago
- Audio-only Emotion Detection using Federated Learning☆10Dec 8, 2022Updated 3 years ago
- OpenFOAM right wmake at the right time☆11Mar 10, 2019Updated 6 years ago
- ☆13Jan 18, 2020Updated 6 years ago
- A Flexible Cache Architectural Simulator☆17Sep 16, 2025Updated 5 months ago
- RISC-V Zve32x, Zve32f, Zvfh Vector Coprocessor☆16Feb 17, 2026Updated 2 weeks ago
- a fast and customizable CUDA int4 tensor core gemm☆15Aug 2, 2024Updated last year
- ☆11Dec 23, 2025Updated 2 months ago
- ☆59Mar 8, 2025Updated 11 months ago
- 2022龙芯杯个人赛三等奖作品☆14Oct 11, 2023Updated 2 years ago
- 无刷电机驱动 程序+电路板 FOC for BLDC motor, code and PCB project☆14Jan 27, 2024Updated 2 years ago
- ☆14Jul 16, 2020Updated 5 years ago
- ☆11Sep 23, 2023Updated 2 years ago
- ICTNet: a novel network for semantic segmentation with the underlying architecture of a fully convolutional network, infused with feature…☆10May 27, 2020Updated 5 years ago
- Official implementation of paper "Self-Supervised Noise Modeling and Sparsity Guided Cryo-ET Image Denoising".☆16Sep 10, 2024Updated last year
- ☆11Sep 29, 2021Updated 4 years ago
- A Distributed Denial of Service Detector and mitigator based on Extended Berkeley Packet Filters (eBPF) and Xpress Data Path (XDP)☆13Oct 22, 2021Updated 4 years ago