晚上下班不刷手机,学点什么。系列一:CUDA 计算框架 CUFX (Cuda Framework eXtended)。
☆16Dec 15, 2024Updated last year
Alternatives and similar repositories for CUFX
Users that are interested in CUFX are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10Aug 20, 2018Updated 7 years ago
- ☆30Nov 16, 2024Updated last year
- CS341 for Spring 2024☆11Jul 15, 2024Updated last year
- This is a repository to practice multi-thread programming in C++☆29Feb 21, 2024Updated 2 years ago
- Source code of the SC '23 paper: "DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multipli…☆29Jun 18, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Persistent Bloom Filter☆12Jul 21, 2018Updated 7 years ago
- 实现一个子集c编译器,后端基于llvm20☆12Mar 13, 2025Updated last year
- ☆15Jan 16, 2024Updated 2 years ago
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 2 months ago
- CS6868: Concurrent Programming☆70Apr 20, 2026Updated 2 weeks ago
- Deep Generative Models course, 2025☆10Jun 5, 2025Updated 11 months ago
- GEMM☆10Aug 26, 2023Updated 2 years ago
- Co-DETR (Detection Transformer) compiled from PyTorch to NVIDIA TensorRT☆20Apr 19, 2025Updated last year
- A Out-of-box PyTorch Scaffold for Neural Network Quantization-Aware-Training (QAT) Research. Website: https://github.com/zhutmost/neuralz…☆25Dec 20, 2022Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆12Jul 4, 2020Updated 5 years ago
- 基于yoloV4,检测茶叶中的杂质,并利用混淆矩阵计算识别率☆18Aug 25, 2020Updated 5 years ago
- ☆11Sep 21, 2022Updated 3 years ago
- learn-P4-by-examples: P4 examples with Chinese documents.☆14Oct 25, 2019Updated 6 years ago
- Naos: Serialization-free RDMA networking in Java☆17Aug 17, 2021Updated 4 years ago
- ☆16Mar 8, 2025Updated last year
- atss的Pytorch实现,支持多卡分布式训练☆16Jan 3, 2021Updated 5 years ago
- A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.☆30Apr 27, 2026Updated last week
- ☆15Aug 28, 2025Updated 8 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- JAX interpreter for Vulkan☆17Jun 1, 2021Updated 4 years ago
- GEMV implementation with CUTLASS☆21Aug 21, 2025Updated 8 months ago
- Multi-heap-sort for many small arrays, quicksort with 3 pivots for one big array, CUDA acceleration, CUDA memory compression.☆13Sep 29, 2024Updated last year
- ☆14Nov 3, 2025Updated 6 months ago
- Free resource for the book AI Compiler Development Guide☆50Dec 22, 2022Updated 3 years ago
- 《汇编语言一发入魂》配套代码☆15May 30, 2020Updated 5 years ago
- Handy tools & graphics API abstraction for blazing fast prototyping☆10Jan 17, 2024Updated 2 years ago
- Deep Learning Demo☆18Oct 14, 2018Updated 7 years ago
- Cute layout visualization☆38Jan 18, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- DoubleAI’s hyperoptimised version of cuGraph☆51Mar 3, 2026Updated 2 months ago
- 。☆13Jan 15, 2022Updated 4 years ago
- ☆18Nov 22, 2025Updated 5 months ago
- This project aims to replicate mainstream open-source model architectures with limited computational resources, implementing mini models …☆180Apr 27, 2026Updated last week
- ☆12Jan 25, 2023Updated 3 years ago
- Implement well-known NLP models from scratch with high-level APIs.☆16Jul 31, 2021Updated 4 years ago
- Hands-On TensorBoard for PyTorch Developers, Published by Packt☆11Dec 15, 2025Updated 4 months ago