☆36Aug 25, 2023Updated 2 years ago
Alternatives and similar repositories for conv2d_direct
Users that are interested in conv2d_direct are comparing it to the libraries listed below
Sorting:
- Taiwei-3D-Flow☆42Updated this week
- Fast GPU based tensor core reductions☆13Jan 13, 2023Updated 3 years ago
- ☆20Aug 20, 2025Updated 6 months ago
- A CUDA kernel for NHWC GroupNorm for PyTorch☆23Nov 15, 2024Updated last year
- from MHA, MQA, GQA to MLA by 苏剑林, with code☆43Feb 19, 2025Updated last year
- SeekFree RT1064 Library GCC(VSCode) Porting☆12Oct 8, 2021Updated 4 years ago
- A set of examples around MegEngine☆31Dec 8, 2023Updated 2 years ago
- ☆52Jan 5, 2026Updated 2 months ago
- From Minimal GEMM to Everything☆163Feb 10, 2026Updated 3 weeks ago
- Implement custom operators in PyTorch with cuda/c++☆77Jan 1, 2023Updated 3 years ago
- Some "Formula Translations" for Yousef Saad's book "Iterative Methods for Sparse Linear Systems (2nd Edition)"☆13Jan 14, 2018Updated 8 years ago
- Algorithms notes learning from ZuoShen.☆10Jun 30, 2022Updated 3 years ago
- Official Repo For AAAI 2026 Accepted Paper "Rethinking the Spatio-Temporal Alignment of End-to-End 3D Perception"☆29Jan 13, 2026Updated last month
- CenterPoint model trained with MMDetection3d on custom dataset, and deployed with TensorRT☆35Mar 15, 2023Updated 2 years ago
- ☆23Feb 26, 2026Updated last week
- LLM-DSE: Searching Accelerator Parameters with LLM Agents☆13May 22, 2025Updated 9 months ago
- ☆33Dec 10, 2025Updated 2 months ago
- ☆11Aug 8, 2018Updated 7 years ago
- 吴恩达深度学习课程课后作业☆10Jan 28, 2020Updated 6 years ago
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 5 months ago
- ☆18Feb 13, 2026Updated 3 weeks ago
- 跟着Tensorrt_pro学习各种知识☆40Nov 25, 2022Updated 3 years ago
- A Flexible Cache Architectural Simulator☆17Sep 16, 2025Updated 5 months ago
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 6 months ago
- nscscc2024,HPU河南理工大学参赛作品,两仪处理器☆11Aug 24, 2024Updated last year
- a fast and customizable CUDA int4 tensor core gemm☆15Aug 2, 2024Updated last year
- 当今海量的移动应用跟人们的生活、工作、学习、休闲、娱乐等方面密切相关,发挥着重要作用。多数APP在安装、更新时,都会向用户申请相关手机权限。多数终端用户缺乏鉴别APP所请求的权限是否合理的能力,并且APP安装使用过程中过度索要权限现象较为普遍,这就给用户数据安全、隐私信息泄…☆13Feb 11, 2020Updated 6 years ago
- RISCV CPU implementation tutorial steps for Cologne Chip Gatemate E1, adopted from https://github.com/BrunoLevy/learn-fpga☆15Updated this week
- RISC-V Zve32x, Zve32f, Zvfh Vector Coprocessor☆16Feb 17, 2026Updated 2 weeks ago
- ☆12Jan 13, 2023Updated 3 years ago
- ☆12Oct 8, 2024Updated last year
- OpenFOAM right wmake at the right time☆11Mar 10, 2019Updated 6 years ago
- ☆11Dec 23, 2025Updated 2 months ago
- ☆13Jan 18, 2020Updated 6 years ago
- ☆14Nov 10, 2019Updated 6 years ago
- ☆59Mar 8, 2025Updated 11 months ago
- 无刷电机驱动 程序+电路板 FOC for BLDC motor, code and PCB project☆14Jan 27, 2024Updated 2 years ago
- ☆14Jul 16, 2020Updated 5 years ago
- ☆11Sep 23, 2023Updated 2 years ago