Penn CIS 5650 (GPU Programming and Architecture) Final Project
☆44Dec 11, 2023Updated 2 years ago
Alternatives and similar repositories for Efficient-LLM-Inferencing-on-GPUs
Users that are interested in Efficient-LLM-Inferencing-on-GPUs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10Oct 8, 2021Updated 4 years ago
- ☆11May 16, 2026Updated last week
- 🎓Automatically Update circult-eda-mlsys-tinyml Papers Daily using Github Actions (Update Every 8th hours)☆10May 18, 2026Updated last week
- ☆14Nov 3, 2025Updated 6 months ago
- manage my star project on github☆11Jul 23, 2020Updated 5 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Benchmark code for the "Online normalizer calculation for softmax" paper☆110Jul 27, 2018Updated 7 years ago
- ☆13Jun 23, 2022Updated 3 years ago
- ☆29Oct 20, 2019Updated 6 years ago
- Make triton easier☆50Jun 12, 2024Updated last year
- PyTorch implementation of GRPO.☆16Apr 21, 2025Updated last year
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆31Dec 21, 2024Updated last year
- 本仓库在OpenVINO推理框架下部署Nanodet检测算法,并重写预处理和后处理部分,具有超高性能!让你在Intel CPU平台上的检测速度起飞! 并基于NNCF和PPQ工具将模型量化(PTQ)至int8精度,推理速度更快!☆16Jun 14, 2023Updated 2 years ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆120Mar 13, 2024Updated 2 years ago
- 一个轻量化的大模型推理框架☆23May 26, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- RISCV C and Triton AI-Benchmark☆25Jan 28, 2026Updated 4 months ago
- A practical way of learning Swizzle☆39Feb 3, 2025Updated last year
- 面向大模型的民族文化数据集☆12May 26, 2025Updated last year
- [WSDM 2026] LookAhead Tuning: Safer Language Models via Partial Answer Previews☆17Dec 14, 2025Updated 5 months ago
- 1st Place Solution to iWildcam 2021: Count the number of animals of each species present in a sequence of images☆12Jun 24, 2021Updated 4 years ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆218Sep 21, 2024Updated last year
- Optimize softmax in triton in many cases☆24Sep 6, 2024Updated last year
- SGEMM optimization with cuda step by step☆22Mar 23, 2024Updated 2 years ago
- Deep Introspective SLAM: Deep Reinforcement Learning based Approach to Avoid Tracking Failure in Visual SLAM☆11Jul 31, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- EESAST 2020 暑期培训☆28Jan 24, 2023Updated 3 years ago
- Implementation of AdaCQR(COLING 2025)☆15Dec 30, 2024Updated last year
- ☆32Jul 17, 2024Updated last year
- An application to simulate Tomasulo's algorithm☆11Jan 16, 2014Updated 12 years ago
- A direct Convolution Neural Network implementation in pure C++, with MNIST dataset.☆13Feb 11, 2015Updated 11 years ago
- ☆22Mar 5, 2024Updated 2 years ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆81Aug 12, 2024Updated last year
- A simple Transformer where the softmax has been replaced with normalization☆20Sep 11, 2020Updated 5 years ago
- InstAttention: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference☆17Mar 30, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Artifact evaluation for HPCA'24 paper Lightening-Transformer: A Dynamically-operated Optically-interconnected Photonic Transformer Accele…☆11Mar 3, 2024Updated 2 years ago
- ☆48Dec 11, 2020Updated 5 years ago
- ☆33Jul 23, 2024Updated last year
- ☆106Sep 9, 2024Updated last year
- 使用 CUDA C++ 实现的 llama 模型推理框架☆65Nov 8, 2024Updated last year
- ☆46Sep 13, 2025Updated 8 months ago
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆53Aug 6, 2025Updated 9 months ago