Accelerating LLM inference with techniques like speculative decoding, quantization, and kernel fusion, focusing on implementing state-of-the-art research papers.
☆11Jul 1, 2025Updated 8 months ago
Alternatives and similar repositories for fast-llm-inference
Users that are interested in fast-llm-inference are comparing it to the libraries listed below
Sorting:
- Notes and code for Programming Massively Parallel Processors☆13Mar 29, 2025Updated 11 months ago
- Code for my ICLR 2024 TinyPapers paper "Prune and Tune: Improving Efficient Pruning Techniques for Massive Language Models"☆16May 26, 2023Updated 2 years ago
- Hierarchical Navigable Small World Graphs☆19Aug 17, 2024Updated last year
- [NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…☆68Jun 26, 2024Updated last year
- ☆12Mar 23, 2025Updated 11 months ago
- A minimal chinese keywords extraction with BERT☆10Nov 1, 2022Updated 3 years ago
- 以太坊白皮書台灣繁體中文版本☆14Oct 31, 2016Updated 9 years ago
- Using RAG to generate data for model fine-tuning.☆13Apr 16, 2025Updated 11 months ago
- A simple test for GAN☆10Mar 25, 2024Updated last year
- The art of github contributions heatmap.☆27Sep 23, 2021Updated 4 years ago
- How to quickly serve an LLM using Fast API, Celery, and Redis☆17Aug 29, 2023Updated 2 years ago
- Large Language Models(LLMs) of Code☆20Apr 23, 2023Updated 2 years ago
- A web interface for SleekDB written in PHP☆11Jan 22, 2022Updated 4 years ago
- Ask Poddy: Run Open Source LLMs and Embeddings as OpenAI-Compatible Serverless Endpoints (Tutorial)☆11Jul 19, 2024Updated last year
- A package that get baseball data☆13Mar 6, 2026Updated last week
- 白上フブキ Fan Site☆11Oct 29, 2022Updated 3 years ago
- ☆17Mar 22, 2023Updated 2 years ago
- ☆20Oct 4, 2018Updated 7 years ago
- 一款乡村志愿服务类小程序-2022微信小程序大赛国二项目☆26Apr 20, 2023Updated 2 years ago
- Blindspots in LLMs I've noticed while AI coding. Sonnet family emphasis.☆13Mar 20, 2025Updated 11 months ago
- My Gen AI research☆11Jun 3, 2024Updated last year
- diffusers with search engine☆12Jan 13, 2026Updated 2 months ago
- A simple website to manage your Hyper-V VMs and IIS sites☆12Jan 19, 2023Updated 3 years ago
- ☆14Sep 18, 2024Updated last year
- ☆14Mar 8, 2025Updated last year
- Framework for Algorithmic Correctness Testing of Operators☆16Mar 9, 2026Updated last week
- OpenMindedChatbot is a Proof Of Concept that leverages the power of Open source Large Language Models (LLM) with Function Calling capabil…☆30Dec 19, 2023Updated 2 years ago
- Resources for deep learning with satellite & aerial imagery☆14Sep 29, 2021Updated 4 years ago
- Finetuning a codegen model with python instruction set using QLORA technique for better efficacy☆11Aug 31, 2023Updated 2 years ago
- A vllm proxy server to add security and multi model management for vllm servers☆12May 30, 2024Updated last year
- ☆12Mar 7, 2024Updated 2 years ago
- An Offline and Secure Retrieval-Augmented Generation (RAG) system designed for efficient processing of diverse content types with minimal…☆20Dec 29, 2024Updated last year
- ☆11Updated this week
- open source version of Umbra☆17Aug 11, 2023Updated 2 years ago
- Distributed Online Service Coordination Using Deep Reinforcement Learning☆19Sep 4, 2023Updated 2 years ago
- Example of Langchain-Elasticsearch integrations & RAG.☆12Sep 20, 2024Updated last year
- My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.☆32Dec 5, 2025Updated 3 months ago
- A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTor…☆15Feb 27, 2024Updated 2 years ago
- A dual-chatbot system for learning languages based on LangChain☆13Jun 25, 2023Updated 2 years ago