wqzustc / High-Performance-Tensor-Processing-EnginesLinks
Some Hardware Architectures for GEMM
☆282Updated 6 months ago
Alternatives and similar repositories for High-Performance-Tensor-Processing-Engines
Users that are interested in High-Performance-Tensor-Processing-Engines are comparing it to the libraries listed below
Sorting:
- ☆137Updated 4 months ago
- ☆24Updated last year
- Official implementation of "REASONING COMPILER: LLM-Guided Optimizations for Efficient Model Serving" (NeurIPS 2025)☆94Updated last week
- Host shell scripts: configure FPGA's DMA-SG via PCIe XDMA.☆26Updated 5 months ago
- CXL remote offloading data movement aware compiler☆70Updated last week
- YiRage (Yield Revolutionary AGile Engine) - Multi-Backend LLM Inference Optimization. Extends Mirage with comprehensive support for CUDA,…☆35Updated this week
- Step-by-step optimization of TPU MatMul Kernels☆85Updated 4 months ago
- Vitis HLS 2022.2 projects source code: C design, C simulation, RTL simulation.【vitis_hls工程】☆23Updated 6 months ago
- [NeurIPS'25] KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems☆102Updated last month
- [Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models☆1,157Updated 2 months ago
- 没分支的 rCore-Tutorial☆30Updated 11 months ago
- Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate (NeurIPS 2024)☆32Updated last year
- [NeurIPS 2025] Accelerating Parallel Diffusion Model Serving with Residual Compression☆39Updated last month
- ☆119Updated this week
- ☆48Updated 4 months ago
- This is a deep learning project applied to signal integrity and RF analysis. Automated modeling, simulation, and data storage of HFSS for…☆68Updated 3 months ago
- A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.☆196Updated last year
- [MM 2024] Official code for VeCAF: Vision-language Collaborative Active Finetuning with Training Objective Awareness☆52Updated last year
- [TMC 2025/NOSSDAV 2023] Official code for RepCaM++ and RepCaM: Re-parameterization Content-aware Modulation for Neural Video Delivery☆54Updated 7 months ago
- RKAN: Residual Kolmogorov-Arnold Network is designed to enhance the performance of deep learning models.☆273Updated last month
- The Python implementation of some deep text hashing (also called deep semantic hashing) Models☆80Updated last week
- ☆172Updated last week
- Group Expectation Policy Optimization for Heterogeneous Reinforcement Learning☆164Updated 3 weeks ago
- A SAR domain-specific language defined in CXX & Python. Keywords: AST, MLIR, LLVM, FPGA HLS. Currently under development...☆17Updated last week
- Welcome to BlockSeek's official documentation. BlockSeek combines state-of-the-art AI with blockchain technology to revolutionize cryptoc…☆310Updated 10 months ago
- We introduce the Audio Logical Reasoning (ALR) dataset, consisting of 6,446 text-audio annotated samples specifically designed for comple…☆1,101Updated 2 weeks ago
- JittorGeometric is a Jittor-based graph machine learning library.☆453Updated 3 months ago
- ☆530Updated 10 months ago
- GigaDatasets: A Unified and Lightweight Framework for Data Processing, Curation, and Visualization☆137Updated last month
- Advanced Quantitative Factor Research: ML-powered stock return prediction with 72% performance improvement. Features comprehensive alpha …☆368Updated 3 months ago