UbiquitousLearning / mllm
Fast Multimodal LLM on Mobile Devices
☆509Updated this week
Related projects ⓘ
Alternatives and complementary repositories for mllm
- Awesome LLMs on Device: A Comprehensive Survey☆908Updated last month
- Survey Paper List - Efficient LLM and Foundation Models☆217Updated last month
- A curated list for Efficient Large Language Models☆1,245Updated last week
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…☆310Updated 2 months ago
- Paper list for Personal LLM Agents☆331Updated 6 months ago
- [NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces in…☆776Updated this week
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆315Updated this week
- Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)☆817Updated last week
- FlashInfer: Kernel Library for LLM Serving☆1,399Updated this week
- TinyChatEngine: On-Device LLM Inference Library☆743Updated 4 months ago
- Disaggregated serving system for Large Language Models (LLMs).☆350Updated 2 months ago
- Low-bit LLM inference on CPU with lookup table☆563Updated 2 weeks ago
- A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations☆732Updated this week
- ☆284Updated 7 months ago
- Large Language Model (LLM) Systems Paper List☆636Updated this week
- Fast inference from large lauguage models via speculative decoding☆562Updated 2 months ago
- QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving☆434Updated this week
- A large-scale simulation framework for LLM inference☆271Updated last month
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.☆724Updated last month
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…☆2,589Updated this week
- Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads☆2,297Updated 4 months ago
- Awesome LLM compression research papers and tools.☆1,177Updated this week
- 📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batc…☆2,795Updated last week
- A throughput-oriented high-performance serving framework for LLMs☆629Updated last month
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆541Updated 3 weeks ago
- 📰 Must-read papers and blogs on Speculative Decoding ⚡️☆451Updated this week
- The homepage of OneBit model quantization framework.☆156Updated 4 months ago
- [NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baich…☆864Updated last month
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆236Updated 7 months ago
- llm-export can export llm model to onnx.☆226Updated last week