☆172Dec 2, 2025Updated 3 months ago
Alternatives and similar repositories for Awesome-LLM-Inference-Engine
Users that are interested in Awesome-LLM-Inference-Engine are comparing it to the libraries listed below
Sorting:
- ☆47Apr 29, 2025Updated 10 months ago
- Official implementation of "OpenCity3D: What do Vision-Language Models know about Urban Environments?" @ WACV2025☆16Nov 24, 2024Updated last year
- ☆22Mar 7, 2025Updated 11 months ago
- 🎮Manipulates mobile phones just like how you would. Official code for "MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficien…☆27Oct 10, 2025Updated 4 months ago
- TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning☆32Jun 13, 2025Updated 8 months ago
- [ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"☆28Jun 23, 2025Updated 8 months ago
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆13Dec 13, 2024Updated last year
- ToolBridge: An Open-Source Dataset to Equip LLMs with External Tool Capabilities☆15Feb 11, 2025Updated last year
- PyTorch code for our paper "Progressive Binarization with Semi-Structured Pruning for LLMs"☆13Sep 28, 2025Updated 5 months ago
- ☆19Sep 10, 2025Updated 5 months ago
- A High-Throughput Multi-GPU System for Graph-Based Approximate Nearest Neighbor Search☆21Jul 22, 2025Updated 7 months ago
- [ICLR 2025] Weighted-Reward Preference Optimization for Implicit Model Fusion☆14Mar 17, 2025Updated 11 months ago
- [Findings of EMNLP 2024] AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models☆20Oct 2, 2024Updated last year
- ☆27Jun 5, 2025Updated 8 months ago
- xKV: Cross-Layer SVD for KV-Cache Compression☆44Nov 30, 2025Updated 3 months ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated 11 months ago
- List of papers related to Vision Transformers quantization and hardware acceleration in recent AI conferences and journals.☆102Jun 2, 2024Updated last year
- ☆51Apr 30, 2025Updated 10 months ago
- ☆43May 29, 2025Updated 9 months ago
- The official implementation of the DAC 2024 paper GQA-LUT☆20Dec 20, 2024Updated last year
- [EMNLP 2025] Code for paper "Table-R1: Inference-Time Scaling for Table Reasoning"☆29Jun 3, 2025Updated 9 months ago
- [ICLR 2026] Official repository of "InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models".☆91Feb 6, 2026Updated 3 weeks ago
- ☆18Jan 17, 2024Updated 2 years ago
- Hydragen: High-Throughput LLM Inference with Shared Prefixes☆48May 10, 2024Updated last year
- JUPITER Benchmark Suite☆23Jul 18, 2025Updated 7 months ago
- Official Implementation of "DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucination"☆28Dec 18, 2024Updated last year
- ☆22Jun 10, 2025Updated 8 months ago
- ☆17Aug 1, 2025Updated 7 months ago
- Documenting my LeetCode training journey during my doctoral studies.☆23Aug 31, 2023Updated 2 years ago
- ☆52Mar 17, 2025Updated 11 months ago
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆29Jul 24, 2025Updated 7 months ago
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆59Mar 17, 2025Updated 11 months ago
- The official repository of Quamba1 [ICLR 2025] & Quamba2 [ICML 2025]☆67Jun 19, 2025Updated 8 months ago
- ☆41May 27, 2025Updated 9 months ago
- ☆49Aug 14, 2025Updated 6 months ago
- The official implementation of Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight☆82Jan 16, 2026Updated last month
- COCCL: Compression and precision co-aware collective communication library☆30Mar 16, 2025Updated 11 months ago
- Complex-Edit: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark☆28Apr 22, 2025Updated 10 months ago
- A curated list of research papers, resources, and advancements on Diffusion Cache and related efficient diffusion model acceleration tech…☆73Nov 4, 2025Updated 3 months ago