lyogavin / airllm
AirLLM 70B inference with single 4GB GPU
☆5,758Updated 4 months ago
Alternatives and similar repositories for airllm:
Users that are interested in airllm are comparing it to the libraries listed below
- Tools for merging pretrained large language models.☆5,571Updated this week
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,034Updated last month
- Go ahead and axolotl questions☆9,165Updated this week
- SGLang is a fast serving framework for large language models and vision language models.☆13,368Updated this week
- Agent framework and applications built upon Qwen>=2.0, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.☆6,564Updated last week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,642Updated last week
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆3,817Updated 8 months ago
- Supercharge Your LLM Application Evaluations 🚀☆8,860Updated last week
- Curated list of datasets and tools for post-training.☆2,948Updated 2 months ago
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,125Updated this week
- Knowledge Agents and Management in the Cloud☆3,885Updated last week
- 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)☆6,323Updated 3 months ago
- An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)☆4,497Updated last week
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆2,955Updated this week
- ☆2,915Updated 7 months ago
- Ollama Python library☆7,374Updated this week
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinks☆6,859Updated 9 months ago
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆12,004Updated this week
- Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sag…☆20,983Updated this week
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆6,183Updated this week
- A modular graph-based Retrieval-Augmented Generation (RAG) system☆24,664Updated this week
- Accessible large language models via k-bit quantization for PyTorch.☆6,932Updated this week
- QLoRA: Efficient Finetuning of Quantized LLMs☆10,394Updated 10 months ago
- Build resilient language agents as graphs.☆11,784Updated this week
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.☆4,818Updated last week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,104Updated 2 weeks ago
- Finetune Llama 4, DeepSeek-R1, Gemma 3 & Reasoning LLMs 2x faster with 70% less memory! 🦥☆37,364Updated this week
- [NeurIPS'24] HippoRAG is a novel RAG framework inspired by human long-term memory that enables LLMs to continuously integrate knowledge a…☆2,254Updated last week
- Structured Text Generation☆11,415Updated this week
- Robust recipes to align language models with human and AI preferences☆5,138Updated 5 months ago