graphcore-research / llm-inference-researchView external linksLinks
An experimentation platform for LLM inference optimisation
☆35Sep 19, 2024Updated last year
Alternatives and similar repositories for llm-inference-research
Users that are interested in llm-inference-research are comparing it to the libraries listed below
Sorting:
- 16-fold memory access reduction with nearly no loss☆110Mar 26, 2025Updated 10 months ago
- ☆53May 13, 2024Updated last year
- ☆19Mar 11, 2025Updated 11 months ago
- ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression (DAC'25)☆23Sep 15, 2025Updated 4 months ago
- ☆15Jun 4, 2024Updated last year
- [NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning☆87Nov 29, 2025Updated 2 months ago
- Explore Inter-layer Expert Affinity in MoE Model Inference☆16May 6, 2024Updated last year
- ☆38Oct 10, 2024Updated last year
- ☆64May 16, 2025Updated 8 months ago
- Hybrid methods for Parallel Betweenness Centrality on the GPU☆24Dec 20, 2018Updated 7 years ago
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆283May 1, 2025Updated 9 months ago
- The implementation of the paper "Parallel Personalized PageRank on Dynamic Graphs"☆25Mar 1, 2018Updated 7 years ago
- Codes of the paper "Speeding Up Set Intersections in Graph Algorithms using SIMD Instructions" that was published in SIGMOD 2018. Authors…☆31Jan 23, 2019Updated 7 years ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆92Jul 17, 2025Updated 6 months ago
- MemLiner is a remote-memory-friendly runtime system.☆31Nov 1, 2022Updated 3 years ago
- [SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference☆82Dec 7, 2025Updated 2 months ago
- ☆25Jan 2, 2021Updated 5 years ago
- A C++11 implementation of the B-Tree part of "The Case for Learned Index Structures"☆81Jan 8, 2018Updated 8 years ago
- Qwik JS with Supabase, NGINX, Stripe etc for SAAS☆10Jan 30, 2023Updated 3 years ago
- ☆11May 16, 2025Updated 8 months ago
- Frog is Asynchronous Graph Processing on GPU with Hybrid Coloring Model. The fundamental idea is based on Pareto principle (or 80-20 rule…☆36May 29, 2021Updated 4 years ago
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference☆56Nov 20, 2024Updated last year
- Full disclosure for http://stackoverflow.com/questions/17465061/how-to-parse-space-separated-floats-in-c-quickly/17479702☆11Nov 6, 2016Updated 9 years ago
- [EMNLP 2023] Question Answering as Programming for Solving Time-Sensitive Questions☆12Dec 18, 2023Updated 2 years ago
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆52Aug 6, 2025Updated 6 months ago
- testbed for different SIMD implementations for set intersection and set union☆41Jan 29, 2020Updated 6 years ago
- Fast C header-only library for popcnt, pospopcnt, and set algebraic operations☆45Dec 16, 2019Updated 6 years ago
- An implementation of the maxflow algorithm by Yuri Boykov and Vladimir Kolmogorov.☆12Nov 26, 2014Updated 11 years ago
- MAchine Micro Management UTilities☆11Nov 5, 2020Updated 5 years ago
- Project of video synopsis☆10May 18, 2016Updated 9 years ago
- Twitch Stream Analysis with Apache Spark and Apache Zeppelin☆12Aug 8, 2016Updated 9 years ago
- Controlled Online Optimization Learning (COOL): Finding the Ground State of Spin Hamiltonians with Reinforcement Learning (arXiv:2003.000…☆13Jun 18, 2020Updated 5 years ago
- The implementation for maximum clique enumeration algorithm☆11Apr 14, 2016Updated 9 years ago
- Smart spinner component for Qwik, to manage the duration of loading states.☆13Sep 25, 2023Updated 2 years ago
- An extension of the ProteoWizard framework enabling the support of the mzDB format☆13Feb 12, 2021Updated 5 years ago
- Demo of fine-tuning QA models for answering FAQ of cloud providers documentation☆11Mar 7, 2023Updated 2 years ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆11Dec 13, 2023Updated 2 years ago
- ☆12Aug 26, 2022Updated 3 years ago
- Exact Structural Graph Clustering☆12Mar 19, 2022Updated 3 years ago