Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )
☆62Mar 23, 2025Updated 11 months ago
Alternatives and similar repositories for hpc
Users that are interested in hpc are comparing it to the libraries listed below
Sorting:
- c++ implementation of mmpose inference, for pose estimation based on MNN☆12Mar 9, 2021Updated 4 years ago
- Dynamic matrix type and algorithms for sparse matrices☆23Feb 12, 2025Updated last year
- tensorrt部署教程☆11Aug 1, 2025Updated 7 months ago
- ☆10Jun 29, 2020Updated 5 years ago
- ☆12Jan 25, 2023Updated 3 years ago
- A thread safe simple C++ wrapper for FFTW & MKL☆17Sep 27, 2021Updated 4 years ago
- ☆11Mar 3, 2020Updated 5 years ago
- Lightweight face detectors with landmarks. Training code using pytorch and inference using pytorch/ncnn/tensorflow/tflite.☆10Jul 1, 2020Updated 5 years ago
- 完成基于 yolov3 与 TensorRT 的 快速目标检测与基于 sgm 与 cuda 的 双目立体重建,发送类别,概率,以及物体在相机坐标系下的xyz☆17Dec 25, 2020Updated 5 years ago
- OpenFPM: A scalable open framework for particle and particle-mesh codes on parallel computers☆23Jan 20, 2026Updated last month
- A rknn cpp/c++ inference codebase for yolov5.☆31Aug 25, 2021Updated 4 years ago
- deploy onnx models with TensorRT and LibTorch☆19Nov 17, 2021Updated 4 years ago
- ROS/ROS2 package tag detection using the UMich or MIT Apriltag library☆20Dec 3, 2025Updated 2 months ago
- the C++ version of thundernet with ncnn☆14Feb 20, 2021Updated 5 years ago
- ☆13Oct 8, 2018Updated 7 years ago
- yolov8n 部署版,基于官方的导出onnx脚本导出onnx模型,在不同平台上进行部署测试,便于移植不同平台(onnx、tensorRT、rknn、Horizon)。☆39May 26, 2023Updated 2 years ago
- This project provides a face recoganization system via opencv4☆18Jan 16, 2019Updated 7 years ago
- Mnn version demo of [ECCV2022] Joint Feature Learning and Relation Modeling for Tracking: A One-Stream Framework.☆20Aug 30, 2023Updated 2 years ago
- train Snet(by thundernet) in imagenet☆18Mar 4, 2020Updated 5 years ago
- the C++ version of Transformer with ncnn☆20Jul 4, 2021Updated 4 years ago
- ☆23Dec 8, 2022Updated 3 years ago
- 个人收藏及总结,包括各个方向,囊括算法,数据结构,编程语言,操作系统,机器学习及深度学习,分布式系统,大数据处理,学习资源及方法 总结.☆21Apr 23, 2023Updated 2 years ago
- Bilinear Image Resize with openmp/cuda☆25Dec 12, 2017Updated 8 years ago
- A project and machine deployment model using Spack☆29Feb 5, 2026Updated 3 weeks ago
- CUDA based parallel Image processing tool☆23Jan 11, 2017Updated 9 years ago
- TensorRT person tracking RFBNet300☆30Mar 5, 2020Updated 5 years ago
- ☆24Sep 6, 2021Updated 4 years ago
- Concurrent CPU-GPU Programming using Task Models☆106Dec 19, 2019Updated 6 years ago
- Code for paper Background Prompting for Improved Object Depth☆29Sep 7, 2023Updated 2 years ago
- A high-level Parallel I/O Library for structured grid applications☆22Feb 11, 2026Updated 2 weeks ago
- SCR caches checkpoint data in storage on the compute nodes of a Linux cluster to provide a fast, scalable checkpoint / restart capability…☆103Oct 28, 2025Updated 4 months ago
- LPD-Net: 3D Point Cloud Learning for Large-Scale Place Recognition and Environment Analysis (ICCV 2019)☆22Nov 4, 2020Updated 5 years ago
- 开源视频人脸跟踪算法,MNN基于mtcnn人脸检测+onet人脸跟踪,在i7-9700k的cpu检测速度可高达500fps☆23May 29, 2020Updated 5 years ago
- A C++ class to represent a sparse matrix in compressed row format. Useful for FEM codes.☆25Nov 19, 2011Updated 14 years ago
- 第一章 指针篇 第二章 CUDA原理篇 第三章 CUDA编译器环境配置篇 第四章 kernel函数基础篇 第五章 kernel索引(index)篇 第六章 kenel矩阵计算实战篇 第七章 kenel实战强化篇 第八章 CUDA内存应用与性能优化篇 第九章 CUDA原子(a…☆31Aug 16, 2024Updated last year
- Automates using spack to build and deploy software☆30Jan 9, 2026Updated last month
- pytorch face_landmark☆25Jan 11, 2023Updated 3 years ago
- An MPI ABI compatibility layer☆34Aug 20, 2025Updated 6 months ago
- Matrix Multiplication on GPU using Shared Memory considering Coalescing and Bank Conflicts☆25Aug 29, 2022Updated 3 years ago