lyogavin / airllmLinks
AirLLM 70B inference with single 4GB GPU
☆11,052Updated 5 months ago
Alternatives and similar repositories for airllm
Users that are interested in airllm are comparing it to the libraries listed below
Sorting:
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆7,606Updated this week
- Go ahead and axolotl questions☆11,251Updated last week
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.☆5,028Updated 10 months ago
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,718Updated 8 months ago
- Tools for merging pretrained large language models.☆6,783Updated 2 weeks ago
- Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization☆1,389Updated last year
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,440Updated 2 months ago
- Large Language Model Text Generation Inference☆10,757Updated last month
- High-speed Large Language Model Serving for Local Deployment☆8,635Updated 2 weeks ago
- Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.☆13,234Updated last week
- A Next-Generation Training Engine Built for Ultra-Large MoE Models☆5,082Updated last week
- OpenChat: Advancing Open-source Language Models with Imperfect Data☆5,472Updated last year
- QLoRA: Efficient Finetuning of Quantized LLMs☆10,835Updated last year
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…☆3,888Updated this week
- A blazing fast inference solution for text embeddings models☆4,476Updated last week
- Python bindings for llama.cpp☆9,971Updated 5 months ago
- SGLang is a high-performance serving framework for large language models and multimodal models.☆23,439Updated this week
- 🤗 AutoTrain Advanced☆4,555Updated 2 weeks ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,314Updated 9 months ago
- PyTorch native post-training library☆5,669Updated this week
- Retrieval and Retrieval-augmented LLMs☆11,280Updated last month
- Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We als…☆18,190Updated 3 months ago
- Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.☆12,099Updated 2 weeks ago
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinks☆7,182Updated last year
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆3,436Updated 6 months ago
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆13,155Updated this week
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,823Updated 3 months ago
- Tensor library for machine learning☆13,923Updated this week
- Accessible large language models via k-bit quantization for PyTorch.☆7,939Updated 3 weeks ago
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆4,581Updated last year