LLM Inference with Deep Learning Accelerator.
☆59Jan 23, 2025Updated last year
Alternatives and similar repositories for LLM-Inference-Acceleration
Users that are interested in LLM-Inference-Acceleration are comparing it to the libraries listed below
Sorting:
- GPU-accelerated LLM Training Simulator☆17Jun 26, 2025Updated 8 months ago
- introduce AI infra knowledges. 人工智能系统基础架构知识库☆16Jun 4, 2023Updated 2 years ago
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆20Aug 3, 2025Updated 7 months ago
- Awesome code, projects, books, etc. related to CUDA☆31Feb 3, 2026Updated last month
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆120Mar 13, 2024Updated last year
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆273Aug 6, 2025Updated 6 months ago
- spark-sight: Spark performance at a glance☆10Apr 6, 2023Updated 2 years ago
- MATLAB/Octave generator of Hamming ECC coding. Output format is Verilog HDL.☆12Dec 27, 2022Updated 3 years ago
- This project is intended to build and deploy an SNPE model on Qualcomm Devices, which are having unsupported layers which are not part of…☆10Oct 4, 2021Updated 4 years ago
- A comprehensive list of pain intensity classification papers mainly based on deep learning algorithms☆12Oct 20, 2024Updated last year
- A throughput-oriented high-performance serving framework for LLMs☆947Oct 29, 2025Updated 4 months ago
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆796Updated this week
- Normalization Matters in Weakly Supervised Object Localization (ICCV 2021)☆11Oct 24, 2021Updated 4 years ago
- Ask question to your PDF☆11Jun 11, 2023Updated 2 years ago
- Official implementation of REArtGS (NeurIPS 2025)☆19Oct 24, 2025Updated 4 months ago
- 使用ONNXRuntime部署一种用于边缘检测的轻量级密集卷积神经网络LDC,包含C++和Python两个版本的程序☆11Apr 24, 2023Updated 2 years ago
- Implement some method of LLM KV Cache Sparsity☆40Jun 6, 2024Updated last year
- langgraph的deepagent源码分析☆15Jan 1, 2026Updated 2 months ago
- Verilog implementation of MC68851 Memory Management Unit☆13Feb 26, 2018Updated 8 years ago
- [ICLR2026] The code for "Interp3D: Correspondence-Aware Interpolation for Generative Textured 3D Morphing."☆24Jan 21, 2026Updated last month
- Delve is a debugger for the Go programming language.☆11Apr 9, 2023Updated 2 years ago
- FastTrack4LLM 是一个为大模型学习者准备的大模型学习与实践框架,帮助他们轻松掌握大模型的核心原理与训练流程,让每个人都能真正理解大模型的内部机制。本项目不仅完整复现了 LLaMA、Qwen、DeepSeek 等主流开源大模型架构,还覆盖了大模型的全生命周期:To…☆24Nov 6, 2025Updated 3 months ago
- 大规模并行处理器编程实战 第二版答案☆35Jun 4, 2022Updated 3 years ago
- Large Language Model Onnx Inference Framework☆35Nov 25, 2025Updated 3 months ago
- ☆12Aug 26, 2016Updated 9 years ago
- ☆94Feb 11, 2026Updated 3 weeks ago
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆46Jun 11, 2025Updated 8 months ago
- Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…☆282Mar 6, 2025Updated 11 months ago
- ☆10Jul 18, 2024Updated last year
- Distributed SDDMM Kernel☆12Jul 8, 2022Updated 3 years ago
- A minimal example of an async web server talking to browsers with WebSockets and redis with aioredis☆10Feb 15, 2018Updated 8 years ago
- DETR tensor去除推理过程无用辅助头+fp16部署再次加速+解决转tensorrt 输出全为0问题的新方法。☆12Jan 9, 2024Updated 2 years ago
- Code for "Training Adversarially Robust Sparse Networks via Bayesian Connectivity Sampling" [ICML 2021]☆10Mar 14, 2022Updated 3 years ago
- ☆11Jan 20, 2023Updated 3 years ago
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆115Jul 11, 2025Updated 7 months ago
- 重构nerf代码,更加容易读懂☆13Mar 26, 2023Updated 2 years ago
- [ICML 2025] Efficiently Serving Large Multimodal Models Using EPD Disaggregation☆22May 29, 2025Updated 9 months ago
- Model summary of keras pre-trained neural networks.☆12Aug 1, 2019Updated 6 years ago
- 关于算法处理实时视频流性能不足使用并行处理的方案和优化(APP层面)。☆11Jun 5, 2021Updated 4 years ago