Lizonghang / prima.cpp
prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters
☆260Updated this week
Alternatives and similar repositories for prima.cpp:
Users that are interested in prima.cpp are comparing it to the libraries listed below
- TPI-LLM: Serving 70b-scale LLMs Efficiently on Low-resource Edge Devices☆176Updated 5 months ago
- CPU inference for the DeepSeek family of large language models in C++☆288Updated this week
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.☆331Updated 10 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆292Updated this week
- Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O☆282Updated 3 months ago
- Efficient LLM Inference over Long Sequences☆368Updated this week
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆626Updated 3 weeks ago
- prime is a framework for efficient, globally distributed training of AI models over the internet.☆701Updated this week
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆573Updated this week
- OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training☆486Updated 3 months ago
- Advanced Quantization Algorithm for LLMs/VLMs.☆431Updated this week
- ☆85Updated last month
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆887Updated this week
- An innovative library for efficient LLM inference via low-bit quantization☆352Updated 7 months ago
- Big & Small LLMs working together☆708Updated this week
- Autonomously train research-agent LLMs on custom data using reinforcement learning and self-verification.☆595Updated 3 weeks ago
- ☆207Updated 2 months ago
- Gemma 2 optimized for your local machine.☆367Updated 8 months ago
- A throughput-oriented high-performance serving framework for LLMs☆794Updated 7 months ago
- [NeurIPS'24 Spotlight, ICLR'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which r…☆971Updated this week
- Awesome Mobile LLMs☆166Updated 3 weeks ago
- Implementing DeepSeek R1's GRPO algorithm from scratch☆445Updated this week
- A simple tool that let's you explore different possible paths that an LLM might sample.☆161Updated last week
- LLM Inference on consumer devices☆105Updated last month
- Serverless LLM Serving for Everyone.☆458Updated this week
- Run LLMs with MLX☆421Updated this week
- ☆140Updated 2 months ago
- Low-bit LLM inference on CPU with lookup table☆720Updated 3 months ago
- Fast parallel LLM inference for MLX☆181Updated 9 months ago
- MoBA: Mixture of Block Attention for Long-Context LLMs☆1,746Updated 2 weeks ago