SqueezeBits / Torch-TRTLLMView external linksLinks
Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.
☆55Jul 16, 2025Updated 6 months ago
Alternatives and similar repositories for Torch-TRTLLM
Users that are interested in Torch-TRTLLM are comparing it to the libraries listed below
Sorting:
- OwLite is a low-code AI model compression toolkit for AI models.☆52Nov 14, 2025Updated 3 months ago
- [ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks☆39Feb 4, 2025Updated last year
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 6 months ago
- CPython 파헤치기 스터디☆16Jul 13, 2024Updated last year
- Implementation of the contextual biasing for ASR decoding on GPUs without lattice generation. The code supports submission to Interspeech…☆21Sep 25, 2023Updated 2 years ago
- Official Implementation of FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration☆29Nov 22, 2025Updated 2 months ago
- FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration☆20Jun 27, 2025Updated 7 months ago
- Provides the examples to write and build Habana custom kernels using the HabanaTools☆25Apr 15, 2025Updated 9 months ago
- A framework to compare low-bit integer and float-point formats☆66Feb 6, 2026Updated last week
- AFPQ code implementation☆23Nov 6, 2023Updated 2 years ago
- [NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"☆30Oct 20, 2025Updated 3 months ago
- An innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification.☆26Apr 15, 2025Updated 10 months ago
- 🚘 가짜연구소 10기 '3D perception for Autonomous Driving'☆16Apr 26, 2025Updated 9 months ago
- ☆11Aug 3, 2025Updated 6 months ago
- Workshop materials for AI Engineer World's Fair☆13Jun 3, 2025Updated 8 months ago
- ☆52Nov 5, 2024Updated last year
- Generic library for neural collapse and several derivative works on the phenomenon.☆18Apr 14, 2025Updated 10 months ago
- [CVPR 2025 Highlight] FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation☆25Jun 16, 2025Updated 7 months ago
- Information-oriented Metric (IOM)☆11Sep 2, 2020Updated 5 years ago
- Repository for the DPP'23 course☆11May 2, 2024Updated last year
- ☆13Dec 13, 2022Updated 3 years ago
- ☆93Updated this week
- Tiny configuration for Triton Inference Server☆45Jan 10, 2025Updated last year
- 湾区日报翻译☆12Nov 16, 2022Updated 3 years ago
- Infrastructure useful to create natural language processing systems based on transformer networks☆12Sep 26, 2019Updated 6 years ago
- ☆11Feb 22, 2022Updated 3 years ago
- 基于DeepConvLSTM的传感器信号分类☆11May 15, 2018Updated 7 years ago
- Official implementation of "Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent".☆21May 23, 2025Updated 8 months ago
- beproud bot system☆14Aug 17, 2023Updated 2 years ago
- iOS应用安全,网络证书校验,反调试,反注入☆14Jan 17, 2020Updated 6 years ago
- 免费使用米扑代理(已发布测试版本)☆10Apr 19, 2018Updated 7 years ago
- 9기 운영진을 위한 repo입니다.☆12Sep 22, 2024Updated last year
- Parallel Self-Adjusting Computation☆15Jul 5, 2021Updated 4 years ago
- Official Implementation of Robustifying and Boosting Training-Free Neural Architecture Search☆10Mar 12, 2024Updated last year
- Various data structure implementations in Python☆11Jul 18, 2019Updated 6 years ago
- [ICLR 2025] Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception☆14Jul 4, 2025Updated 7 months ago
- ☆11Apr 5, 2023Updated 2 years ago
- Ray and Anyscale for UC Berkeley AI Hackathon!☆11Jun 17, 2023Updated 2 years ago
- ☆11Jun 17, 2024Updated last year