Penn CIS 5650 (GPU Programming and Architecture) Final Project
☆44Dec 11, 2023Updated 2 years ago
Alternatives and similar repositories for Efficient-LLM-Inferencing-on-GPUs
Users that are interested in Efficient-LLM-Inferencing-on-GPUs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10Oct 8, 2021Updated 4 years ago
- ☆11Sep 21, 2022Updated 3 years ago
- 🎓Automatically Update circult-eda-mlsys-tinyml Papers Daily using Github Actions (Update Every 8th hours)☆10Updated this week
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Apr 2, 2025Updated 11 months ago
- ☆14Nov 3, 2025Updated 4 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Benchmark code for the "Online normalizer calculation for softmax" paper☆109Jul 27, 2018Updated 7 years ago
- ☆13Jun 23, 2022Updated 3 years ago
- ☆29Oct 20, 2019Updated 6 years ago
- PyTorch implementation of GRPO.☆15Apr 21, 2025Updated 11 months ago
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆30Dec 21, 2024Updated last year
- 本仓库在OpenVINO推理框架下部署Nanodet检测算法,并重写预处理和后处理部分,具有超高性能!让你在Intel CPU平台上的检测速度起飞! 并基于NNCF和PPQ工具将模型量化(PTQ)至int8精度,推理速度更快!☆16Jun 14, 2023Updated 2 years ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆119Mar 13, 2024Updated 2 years ago
- 一个轻量化的大模型推理框架☆22May 26, 2025Updated 10 months ago
- RISCV C and Triton AI-Benchmark☆23Jan 28, 2026Updated 2 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A practical way of learning Swizzle☆37Feb 3, 2025Updated last year
- 面向大模型的民族文化数据集☆12May 26, 2025Updated 10 months ago
- [WSDM 2026] LookAhead Tuning: Safer Language Models via Partial Answer Previews☆17Dec 14, 2025Updated 3 months ago
- The official code for Dropping Backward Propagation (DropBP)☆32Oct 29, 2024Updated last year
- 1st Place Solution to iWildcam 2021: Count the number of animals of each species present in a sequence of images☆12Jun 24, 2021Updated 4 years ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆211Sep 21, 2024Updated last year
- Optimize softmax in triton in many cases☆23Sep 6, 2024Updated last year
- Deep Introspective SLAM: Deep Reinforcement Learning based Approach to Avoid Tracking Failure in Visual SLAM☆11Jul 31, 2021Updated 4 years ago
- EESAST 2020 暑期培训☆28Jan 24, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Implementation of AdaCQR(COLING 2025)☆14Dec 30, 2024Updated last year
- ☆33Jul 17, 2024Updated last year
- An application to simulate Tomasulo's algorithm☆11Jan 16, 2014Updated 12 years ago
- A direct Convolution Neural Network implementation in pure C++, with MNIST dataset.☆13Feb 11, 2015Updated 11 years ago
- ☆22Mar 5, 2024Updated 2 years ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆82Aug 12, 2024Updated last year
- mHC kernels implemented in CUDA☆257Mar 9, 2026Updated 3 weeks ago
- A simple Transformer where the softmax has been replaced with normalization☆20Sep 11, 2020Updated 5 years ago
- 对 tensorRT_Pro 开源项目理解☆22Feb 23, 2023Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Artifact evaluation for HPCA'24 paper Lightening-Transformer: A Dynamically-operated Optically-interconnected Photonic Transformer Accele…☆11Mar 3, 2024Updated 2 years ago
- MindVision camera driver☆13Mar 13, 2018Updated 8 years ago
- ☆48Dec 11, 2020Updated 5 years ago
- ☆33Jul 23, 2024Updated last year
- ☆105Sep 9, 2024Updated last year
- 使用 CUDA C++ 实现的 llama 模型推理框架☆65Nov 8, 2024Updated last year
- ☆40Sep 13, 2025Updated 6 months ago