An experimentation platform for LLM inference optimisation
☆36Sep 19, 2024Updated last year
Alternatives and similar repositories for llm-inference-research
Users that are interested in llm-inference-research are comparing it to the libraries listed below
Sorting:
- 16-fold memory access reduction with nearly no loss☆108Mar 26, 2025Updated 11 months ago
- ☆52May 13, 2024Updated last year
- ☆18Mar 11, 2025Updated 11 months ago
- ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression (DAC'25)☆25Feb 26, 2026Updated last week
- ☆302Jul 10, 2025Updated 7 months ago
- [NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning☆88Nov 29, 2025Updated 3 months ago
- Explore Inter-layer Expert Affinity in MoE Model Inference☆16May 6, 2024Updated last year
- ☆36Oct 10, 2024Updated last year
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆374Jul 10, 2025Updated 7 months ago
- Scaling Up Subgraph Query Processing with Efficient Subgraph Matching by Shixuan Sun and Dr. Qiong Luo☆18Nov 24, 2018Updated 7 years ago
- Hybrid methods for Parallel Betweenness Centrality on the GPU☆24Dec 20, 2018Updated 7 years ago
- The implementation of the paper "Parallel Personalized PageRank on Dynamic Graphs"☆25Mar 1, 2018Updated 8 years ago
- Codes of the paper "Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions" that was published in SIGMOD 2018. Authors…☆31Jan 23, 2019Updated 7 years ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆91Jul 17, 2025Updated 7 months ago
- Code for monograph "Cohesive Subgraph Computation over Large Sparse Graphs"☆26Apr 24, 2022Updated 3 years ago
- MemLiner is a remote-memory-friendly runtime system.☆31Nov 1, 2022Updated 3 years ago
- [SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference☆83Dec 7, 2025Updated 2 months ago
- ☆25Jan 2, 2021Updated 5 years ago
- A C++11 implementation of the B-Tree part of "The Case for Learned Index Structures"☆81Jan 8, 2018Updated 8 years ago
- ☆16Feb 2, 2021Updated 5 years ago
- ☆95Nov 25, 2024Updated last year
- Frog is Asynchronous Graph Processing on GPU with Hybrid Coloring Model. The fundamental idea is based on Pareto principle (or 80-20 rule…☆36May 29, 2021Updated 4 years ago
- Dynamic Context Selection for Efficient Long-Context LLMs☆56May 20, 2025Updated 9 months ago
- Full disclosure for http://stackoverflow.com/questions/17465061/how-to-parse-space-separated-floats-in-c-quickly/17479702☆11Nov 6, 2016Updated 9 years ago
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference☆57Nov 20, 2024Updated last year
- [EMNLP 2023] Question Answering as Programming for Solving Time-Sensitive Questions☆12Dec 18, 2023Updated 2 years ago
- Big Data and Machine Intelligence, Spring 2021.☆12Jul 2, 2021Updated 4 years ago
- Fast C header-only library for popcnt, pospopcnt, and set algebraic operations☆45Dec 16, 2019Updated 6 years ago
- Smart spinner component for Qwik, to manage the duration of loading states.☆13Sep 25, 2023Updated 2 years ago
- Demo of fine-tuning QA models for answering FAQ of cloud providers documentation☆11Mar 7, 2023Updated 3 years ago
- ☆10Jun 28, 2019Updated 6 years ago
- OEBench: Investigating Open Environment Challenges in Real-World Relational Data Streams (VLDB 2024)☆13Aug 27, 2024Updated last year
- Codebase for the EMNLP 2021 paper "HittER: Hierarchical Transformers for Knowledge Graph Embeddings".☆12Nov 1, 2021Updated 4 years ago
- ☆12Aug 26, 2022Updated 3 years ago
- An implementation of the maxflow algorithm by Yuri Boykov and Vladimir Kolmogorov.☆12Nov 26, 2014Updated 11 years ago
- Ever wondered how popular your GitHub repo is compared to others?☆16Feb 14, 2026Updated 3 weeks ago
- A Neural Two-Stage Approach for Recognizing Discontiguous Entities (EMNLP 2019)☆11Aug 27, 2019Updated 6 years ago
- Artifact of paper "Exploiting Recent SIMD Architectural Advances for Irregular Applications"☆11Jun 23, 2016Updated 9 years ago
- Memory-mapped VGA display for Xilinx/Zynq/Zedboard, with demo code for using it.☆15Feb 26, 2018Updated 8 years ago