lyogavin / airllmLinks
AirLLM 70B inference with single 4GB GPU
☆6,311Updated 2 months ago
Alternatives and similar repositories for airllm
Users that are interested in airllm are comparing it to the libraries listed below
Sorting:
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.☆4,983Updated 6 months ago
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,354Updated 2 months ago
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆7,223Updated this week
- Tools for merging pretrained large language models.☆6,412Updated last week
- Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization☆1,380Updated 11 months ago
- High-speed Large Language Model Serving for Local Deployment☆8,374Updated 3 months ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,261Updated 5 months ago
- LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath☆9,458Updated 5 months ago
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,902Updated 2 years ago
- Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).☆7,101Updated last week
- [ICLR 2025] LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs☆1,780Updated 4 months ago
- Large Language Model Text Generation Inference☆10,621Updated last month
- 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)☆6,665Updated 4 months ago
- Retrieval and Retrieval-augmented LLMs☆10,772Updated 2 weeks ago
- A lightweight framework for building LLM-based agents☆2,199Updated 3 months ago
- Go ahead and axolotl questions☆10,716Updated this week
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,528Updated 5 months ago
- Enforce the output format (JSON Schema, Regex etc) of a language model☆1,944Updated 2 months ago
- A blazing fast inference solution for text embeddings models☆4,156Updated last week
- A series of large language models trained from scratch by developers @01-ai☆7,845Updated 11 months ago
- Retrieval Augmented Generation (RAG) chatbot powered by Weaviate☆7,408Updated 3 months ago
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali☆2,532Updated last week
- A Next-Generation Training Engine Built for Ultra-Large MoE Models☆4,963Updated this week
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆8,784Updated last year
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆12,883Updated last week
- ☆3,035Updated last year
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,542Updated last week
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆4,376Updated last year
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding☆2,257Updated 5 months ago
- H2O LLM Studio - a framework and no-code GUI for fine-tuning LLMs. Documentation: https://docs.h2o.ai/h2o-llmstudio/☆4,702Updated last month