Transformer related optimization, including BERT, GPT
☆17Jul 29, 2023Updated 2 years ago
Alternatives and similar repositories for FasterTransformer
Users that are interested in FasterTransformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Transformer related optimization, including BERT, GPT☆39Feb 10, 2023Updated 3 years ago
- Whisper in TensorRT-LLM☆17Sep 21, 2023Updated 2 years ago
- qwen2 and llama3 cpp implementation☆50Jun 7, 2024Updated last year
- ☆32Apr 19, 2025Updated 11 months ago
- ☆13Mar 24, 2023Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- stable diffusion, controlnet, tensorrt, accelerate☆105Apr 28, 2023Updated 2 years ago
- ☆30Mar 27, 2023Updated 3 years ago
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆58Jul 23, 2024Updated last year
- ☆13May 25, 2023Updated 2 years ago
- 东北大学计算机/物联网-操作系统课程设计☆12Jun 27, 2019Updated 6 years ago
- ☆70Dec 9, 2022Updated 3 years ago
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)☆30Jan 22, 2026Updated 2 months ago
- let coding agents use ncu skills analysis cuda program automatically!☆82Feb 5, 2026Updated 2 months ago
- 使用 CUDA C++ 实现的 llama 模型推理框架☆65Nov 8, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆479Mar 15, 2024Updated 2 years ago
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆1,080Updated this week
- ☆412Nov 11, 2023Updated 2 years ago
- bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码☆34Aug 12, 2024Updated last year
- [ICML 2025] RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression☆39Aug 7, 2025Updated 8 months ago
- repository for the MICCAI 2022 AutoPET challenge☆14Sep 19, 2022Updated 3 years ago
- ☆128Dec 24, 2024Updated last year
- A lightweight UNet implementation, using Keras☆14Jan 16, 2020Updated 6 years ago
- Code Repository for the NeurIPS 2024 Paper "Toward Efficient Inference for Mixture of Experts".☆19Oct 30, 2024Updated last year
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- A high-throughput and memory-efficient inference and serving engine for LLMs☆13Oct 10, 2025Updated 6 months ago
- Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".☆16Sep 15, 2024Updated last year
- c# library for decoding K2 transducer Models,used in speech recognition (ASR)☆13Aug 20, 2025Updated 7 months ago
- QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.☆154Aug 21, 2025Updated 7 months ago
- 多集群使用thanos sidecar+MinIO监控告警☆15Feb 20, 2023Updated 3 years ago
- Yolov9 TensorRT Inference Python backend☆35Mar 16, 2024Updated 2 years ago
- A parallelism VAE avoids OOM for high resolution image generation☆89Mar 12, 2026Updated 3 weeks ago
- ☆12May 20, 2020Updated 5 years ago
- MACCIA 2022 paper reading notes: tasks and datasets☆12Feb 6, 2023Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''☆31Oct 24, 2024Updated last year
- pytorch版的命名实体识别,LSTM和LSTM_CRF☆25Aug 16, 2019Updated 6 years ago
- https://mp.weixin.qq.com/s/7t0e_hfyDh1b2GPVlzXIMg 或 https://yq.aliyun.com/articles/636272☆11Aug 31, 2018Updated 7 years ago
- 逻辑回归和单 层softmax的解析解☆12Jul 29, 2021Updated 4 years ago
- 大模型API性能指标比较 - 深入分析TTFT、TPS等关键指标☆20Sep 12, 2024Updated last year
- ☆14May 28, 2019Updated 6 years ago
- ☆13Apr 4, 2023Updated 3 years ago