ccs96307/fast-llm-inference

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ccs96307/fast-llm-inference)

ccs96307 / fast-llm-inference

Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.

☆11

Alternatives and similar repositories for fast-llm-inference

Users that are interested in fast-llm-inference are comparing it to the libraries listed below

Sorting:

mandliya / PMPP_notes
View on GitHub
Notes and code for Programming Massively Parallel Processors
☆13Mar 29, 2025Updated 11 months ago
Aaquib111 / Sparse-GPT-Finetuning
View on GitHub
Code for my ICLR 2024 TinyPapers paper "Prune and Tune: Improving Efficient Pruning Techniques for Massive Language Models"
☆16May 26, 2023Updated 2 years ago
fogfish / hnsw
View on GitHub
Hierarchical Navigable Small World Graphs
☆19Aug 17, 2024Updated last year
Equationliu / Kangaroo
View on GitHub
[NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…
☆68Jun 26, 2024Updated last year
JacksonCakes / vision-r1
View on GitHub
☆12Mar 23, 2025Updated 11 months ago
JacksonCakes / chinese_keybert
View on GitHub
A minimal chinese keywords extraction with BERT
☆10Nov 1, 2022Updated 3 years ago
ldkrsi / ethereum_wiki_zh_tw
View on GitHub
以太坊白皮書台灣繁體中文版本
☆14Oct 31, 2016Updated 9 years ago
m-newhauser / rag4rag
View on GitHub
Using RAG to generate data for model fine-tuning.
☆13Apr 16, 2025Updated 11 months ago
ccs96307 / gan-mnist-pytorch-implemented
View on GitHub
A simple test for GAN
☆10Mar 25, 2024Updated last year
qinshuang1998 / GithubPainter
View on GitHub
The art of github contributions heatmap.
☆27Sep 23, 2021Updated 4 years ago
AI-Maker-Space / FastAPI-LLM-Model-Serving
View on GitHub
How to quickly serve an LLM using Fast API, Celery, and Redis
☆17Aug 29, 2023Updated 2 years ago
wanghanbinpanda / Large-Language-Models-for-Code
View on GitHub
Large Language Models(LLMs) of Code
☆20Apr 23, 2023Updated 2 years ago
matkalis / phpsleekdbadmin
View on GitHub
A web interface for SleekDB written in PHP
☆11Jan 22, 2022Updated 4 years ago
blib-la / ask-poddy
View on GitHub
Ask Poddy: Run Open Source LLMs and Embeddings as OpenAI-Compatible Serverless Endpoints (Tutorial)
☆11Jul 19, 2024Updated last year
ss77995ss / baseball-stats-python
View on GitHub
A package that get baseball data
☆13Mar 6, 2026Updated last week
nevrending / foob
View on GitHub
白上フブキ Fan Site
☆11Oct 29, 2022Updated 3 years ago
IrisLi17 / onpolicy_algorithm
View on GitHub
☆17Mar 22, 2023Updated 2 years ago
pethoalpar / AndroidTessTwoOCR
View on GitHub
☆20Oct 4, 2018Updated 7 years ago
5SSjw / Star-village-volunteer
View on GitHub
一款乡村志愿服务类小程序-2022微信小程序大赛国二项目
☆26Apr 20, 2023Updated 2 years ago
ezyang / ai-blindspots
View on GitHub
Blindspots in LLMs I've noticed while AI coding. Sonnet family emphasis.
☆13Mar 20, 2025Updated 11 months ago
TroyDoesAI / AI_Research
View on GitHub
My Gen AI research
☆11Jun 3, 2024Updated last year
suzukimain / auto_diffusers
View on GitHub
diffusers with search engine
☆12Jan 13, 2026Updated 2 months ago
joszz / HyperVAdmin
View on GitHub
A simple website to manage your Hyper-V VMs and IIS sites
☆12Jan 19, 2023Updated 3 years ago
dgg32 / age_vector
View on GitHub
☆14Sep 18, 2024Updated last year
Deep-Learning-Profiling-Tools / triton-samples
View on GitHub
☆14Mar 8, 2025Updated last year
meta-pytorch / FACTO
View on GitHub
Framework for Algorithmic Correctness Testing of Operators
☆16Mar 9, 2026Updated last week
mourad-ghafiri / OpenMindedChatbot
View on GitHub
OpenMindedChatbot is a Proof Of Concept that leverages the power of Open source Large Language Models (LLM) with Function Calling capabil…
☆30Dec 19, 2023Updated 2 years ago
SpaceNetLab / satellite-image-deep-learning
View on GitHub
Resources for deep learning with satellite & aerial imagery
☆14Sep 29, 2021Updated 4 years ago
abvijaykumar / python-lora-finetuning
View on GitHub
Finetuning a codegen model with python instruction set using QLORA technique for better efficacy
☆11Aug 31, 2023Updated 2 years ago
ParisNeo / vllm_proxy_server
View on GitHub
A vllm proxy server to add security and multi model management for vllm servers
☆12May 30, 2024Updated last year
vdesai2014 / diffusion-policy-accelerated
View on GitHub
☆12Mar 7, 2024Updated 2 years ago
Abdoulaye-Sayouti / Secure-Offline-RAG-System
View on GitHub
An Offline and Secure Retrieval-Augmented Generation (RAG) system designed for efficient processing of diverse content types with minimal…
☆20Dec 29, 2024Updated last year
rkuo2000 / GenAI
View on GitHub
☆11Updated this week
ConnectedSystemsLab / Umbra
View on GitHub
open source version of Umbra
☆17Aug 11, 2023Updated 2 years ago
RealVNF / distributed-drl-coordination
View on GitHub
Distributed Online Service Coordination Using Deep Reinforcement Learning
☆19Sep 4, 2023Updated 2 years ago
ashishtiwari1993 / langchain-elasticsearch-RAG
View on GitHub
Example of Langchain-Elasticsearch integrations & RAG.
☆12Sep 20, 2024Updated last year
thib-s / flash-newton-schulz
View on GitHub
My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.
☆32Dec 5, 2025Updated 3 months ago
aahouzi / llama2-chatbot-cpu
View on GitHub
A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTor…
☆15Feb 27, 2024Updated 2 years ago
ShuaiGuo16 / language_learning_app
View on GitHub
A dual-chatbot system for learning languages based on LangChain
☆13Jun 25, 2023Updated 2 years ago