lyogavin / airllmLinks
AirLLM 70B inference with single 4GB GPU
☆5,901Updated 3 months ago
Alternatives and similar repositories for airllm
Users that are interested in airllm are comparing it to the libraries listed below
Sorting:
- Tools for merging pretrained large language models.☆6,195Updated last week
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.☆4,922Updated 4 months ago
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,279Updated last week
- OpenChat: Advancing Open-source Language Models with Imperfect Data☆5,418Updated 11 months ago
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆6,879Updated this week
- Go ahead and axolotl questions☆10,245Updated this week
- An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)☆4,701Updated last week
- QLoRA: Efficient Finetuning of Quantized LLMs☆10,617Updated last year
- Large Language Model Text Generation Inference☆10,424Updated last week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,227Updated 3 months ago
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,359Updated 5 months ago
- A blazing fast inference solution for text embeddings models☆3,912Updated this week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,849Updated 2 weeks ago
- Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization☆1,344Updated 8 months ago
- SGLang is a fast serving framework for large language models and vision language models.☆16,953Updated last week
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,894Updated last year
- A framework for few-shot evaluation of language models.☆9,860Updated last week
- g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains☆4,224Updated 6 months ago
- LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath☆9,450Updated 2 months ago
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,642Updated last year
- 🐝 The First Graph Agentic Framework with RL and Prompt Optimization☆919Updated 7 months ago
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,375Updated 3 months ago
- Python bindings for llama.cpp☆9,486Updated last week
- Retrieval and Retrieval-augmented LLMs☆10,364Updated this week
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆8,700Updated last year
- Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.☆10,976Updated last month
- Chat language model that can use tools and interpret the results☆1,578Updated 3 weeks ago
- Large-scale LLM inference engine☆1,524Updated this week
- This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai,…☆2,175Updated last year
- High-speed Large Language Model Serving for Local Deployment☆8,309Updated 3 weeks ago