lyogavin / airllmLinks
AirLLM 70B inference with single 4GB GPU
☆6,464Updated 3 months ago
Alternatives and similar repositories for airllm
Users that are interested in airllm are comparing it to the libraries listed below
Sorting:
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,394Updated 2 weeks ago
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.☆5,014Updated 8 months ago
- Tools for merging pretrained large language models.☆6,630Updated last week
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆7,437Updated this week
- Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization☆1,383Updated last year
- Go ahead and axolotl questions☆11,005Updated this week
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,570Updated 7 months ago
- The RedPajama-Data repository contains code for preparing large datasets for training large language models.☆4,905Updated last year
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinks☆7,165Updated last year
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,995Updated last week
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,906Updated 2 years ago
- A blazing fast inference solution for text embeddings models☆4,345Updated last week
- LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath☆9,472Updated 6 months ago
- QLoRA: Efficient Finetuning of Quantized LLMs☆10,795Updated last year
- PyTorch native post-training library☆5,629Updated this week
- AllenAI's post-training codebase☆3,474Updated this week
- Large Language Model Text Generation Inference☆10,711Updated last week
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali☆2,592Updated last week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,300Updated 7 months ago
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,701Updated 2 months ago
- A Next-Generation Training Engine Built for Ultra-Large MoE Models☆5,031Updated this week
- Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.☆12,011Updated last week
- H2O LLM Studio - a framework and no-code GUI for fine-tuning LLMs. Documentation: https://docs.h2o.ai/h2o-llmstudio/☆4,761Updated last week
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆8,840Updated last year
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…☆3,804Updated this week
- Python bindings for llama.cpp☆9,851Updated 4 months ago
- Structured Outputs☆13,161Updated 2 weeks ago
- Chat language model that can use tools and interpret the results☆1,589Updated 3 weeks ago
- Modeling, training, eval, and inference code for OLMo☆6,245Updated last month
- Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.☆12,764Updated 3 months ago