晚上下班不刷手机,学点什么。系列一:CUDA 计算框架 CUFX (Cuda Framework eXtended)。
☆17Dec 15, 2024Updated last year
Alternatives and similar repositories for CUFX
Users that are interested in CUFX are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆30Nov 16, 2024Updated last year
- ☆17Jan 10, 2025Updated last year
- 希加加训练营对外练习题☆62Oct 20, 2025Updated 7 months ago
- 使用FastAPI构建发票识别系统后端服务,支持并发。使用ERFNet模型训练发票轮廓检测,进行畸变矫正,OCR识别,模板匹配,支持倾斜发票识别。准确率99.9%。☆13May 8, 2025Updated last year
- An MLIR-based compiler that takes GPU kernels and compiles them to real hardware instructions. Interactive web visualizer included.☆131Mar 21, 2026Updated 2 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Take your first step in writing a compiler. Implemented in Rust.☆16Apr 17, 2023Updated 3 years ago
- This is a repository to practice multi-thread programming in C++☆30Feb 21, 2024Updated 2 years ago
- 时空数据处理与组织作业(含大作业和实习)☆13Apr 16, 2023Updated 3 years ago
- Source code of the SC '23 paper: "DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multipli…☆29Jun 18, 2024Updated last year
- 实现一个子集c编译器,后端基于llvm20☆12Mar 13, 2025Updated last year
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 4 months ago
- GEMM☆10Aug 26, 2023Updated 2 years ago
- Co-DETR (Detection Transformer) compiled from PyTorch to NVIDIA TensorRT☆20Apr 19, 2025Updated last year
- A Out-of-box PyTorch Scaffold for Neural Network Quantization-Aware-Training (QAT) Research. Website: https://github.com/zhutmost/neuralz…☆26Dec 20, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆12Jul 4, 2020Updated 5 years ago
- ☆41Feb 14, 2026Updated 4 months ago
- ☆11May 16, 2026Updated 3 weeks ago
- learn-P4-by-examples: P4 examples with Chinese documents.☆14Oct 25, 2019Updated 6 years ago
- ☆16Mar 8, 2025Updated last year
- atss的Pytorch实现,支持多卡分布式训练☆16Jan 3, 2021Updated 5 years ago
- 2023 OceanBase 数据库大赛初赛☆117May 8, 2024Updated 2 years ago
- A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.☆33May 26, 2026Updated 2 weeks ago
- ☆17Aug 28, 2025Updated 9 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- JAX interpreter for Vulkan☆17Jun 1, 2021Updated 5 years ago
- 🎓Automatically Update circult-eda-mlsys-tinyml Papers Daily using Github Actions (Update Every 8th hours)☆10Jun 8, 2026Updated last week
- My tests and experiments with some popular dl frameworks.☆17Sep 11, 2025Updated 9 months ago
- 重庆大学计算机学院计算机科学与技术课程相关文档和实验☆21Mar 3, 2023Updated 3 years ago
- GEMV implementation with CUTLASS☆21Aug 21, 2025Updated 9 months ago
- Multi-heap-sort for many small arrays, quicksort with 3 pivots for one big array, CUDA acceleration, CUDA memory compression.☆13Sep 29, 2024Updated last year
- ☆14Nov 3, 2025Updated 7 months ago
- Free resource for the book AI Compiler Development Guide☆50Dec 22, 2022Updated 3 years ago
- A modern C++ library for working with JSON data, aims to provide full support for the JSON standard, as well as allowing users to customi…☆12Apr 8, 2026Updated 2 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- 《汇编语言一发入魂》配套代码☆15May 30, 2020Updated 6 years ago
- Handy tools & graphics API abstraction for blazing fast prototyping☆10Jan 17, 2024Updated 2 years ago
- This project aims to provide a high effective KV cache manage framework for llm inference and improve memory utilization and inference sp…☆61Apr 24, 2026Updated last month
- 大数据项目实战之基于Spark2.X的新闻话题的实时统计分析☆26Jul 1, 2022Updated 3 years ago
- 用C++和Python实现从头实现一个深度学习训练框架☆12Nov 22, 2020Updated 5 years ago
- DoubleAI’s hyperoptimised version of cuGraph☆60Mar 3, 2026Updated 3 months ago
- ☆18Nov 22, 2025Updated 6 months ago