Penn CIS 5650 (GPU Programming and Architecture) Final Project
☆44Dec 11, 2023Updated 2 years ago
Alternatives and similar repositories for Efficient-LLM-Inferencing-on-GPUs
Users that are interested in Efficient-LLM-Inferencing-on-GPUs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10Oct 8, 2021Updated 4 years ago
- ☆11May 16, 2026Updated last month
- 🎓Automatically Update circult-eda-mlsys-tinyml Papers Daily using Github Actions (Update Every 8th hours)☆10Jun 8, 2026Updated last week
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Apr 2, 2025Updated last year
- ☆14Nov 3, 2025Updated 7 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Benchmark code for the "Online normalizer calculation for softmax" paper☆110Jul 27, 2018Updated 7 years ago
- ☆13Jun 23, 2022Updated 3 years ago
- ☆29Oct 20, 2019Updated 6 years ago
- Make triton easier☆50Jun 12, 2024Updated 2 years ago
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆32Dec 21, 2024Updated last year
- 本仓库在OpenVINO推理框架下部署Nanodet检测算法,并重写预处理和后处理部分,具有超高性能!让你在Intel CPU平台上的检测速度起飞! 并基于NNCF和PPQ工具将模型量化(PTQ)至int8精度,推理速度更快!☆16Jun 14, 2023Updated 3 years ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆122Mar 13, 2024Updated 2 years ago
- RISCV C and Triton AI-Benchmark☆25Jan 28, 2026Updated 4 months ago
- A practical way of learning Swizzle☆41Feb 3, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- 面向大模型的民族文化数据集☆12May 26, 2025Updated last year
- [WSDM 2026] LookAhead Tuning: Safer Language Models via Partial Answer Previews☆18Dec 14, 2025Updated 6 months ago
- The official code for Dropping Backward Propagation (DropBP)☆32Oct 29, 2024Updated last year
- 1st Place Solution to iWildcam 2021: Count the number of animals of each species present in a sequence of images☆12Jun 24, 2021Updated 4 years ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆221Sep 21, 2024Updated last year
- Optimize softmax in triton in many cases☆24Sep 6, 2024Updated last year
- SGEMM optimization with cuda step by step☆22Mar 23, 2024Updated 2 years ago
- EESAST 2020 暑期培训☆28Jan 24, 2023Updated 3 years ago
- Implementation of AdaCQR(COLING 2025)☆15Dec 30, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆111Jun 28, 2025Updated 11 months ago
- ☆32Jul 17, 2024Updated last year
- An application to simulate Tomasulo's algorithm☆11Jan 16, 2014Updated 12 years ago
- ☆22Mar 5, 2024Updated 2 years ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆81Aug 12, 2024Updated last year
- mHC kernels implemented in CUDA☆267Mar 9, 2026Updated 3 months ago
- A simple Transformer where the softmax has been replaced with normalization☆20Sep 11, 2020Updated 5 years ago
- 对 tensorRT_Pro 开源项目理解☆22Feb 23, 2023Updated 3 years ago
- MindVision camera driver☆13Mar 13, 2018Updated 8 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Artifact evaluation for HPCA'24 paper Lightening-Transformer: A Dynamically-operated Optically-interconnected Photonic Transformer Accele…☆11Mar 3, 2024Updated 2 years ago
- ☆49Dec 11, 2020Updated 5 years ago
- ☆33Jul 23, 2024Updated last year
- ☆107Sep 9, 2024Updated last year
- 使用 CUDA C++ 实现的 llama 模型推理框架☆65Nov 8, 2024Updated last year
- Online documentation can be found at https://minres.github.io/SCViewer/☆21Apr 10, 2026Updated 2 months ago
- ☆48Sep 13, 2025Updated 9 months ago