lyogavin / airllmLinks
AirLLM 70B inference with single 4GB GPU
☆7,011Updated 4 months ago
Alternatives and similar repositories for airllm
Users that are interested in airllm are comparing it to the libraries listed below
Sorting:
- Tools for merging pretrained large language models.☆6,680Updated 2 weeks ago
- Go ahead and axolotl questions☆11,098Updated this week
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,414Updated last month
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,671Updated 7 months ago
- High-speed Large Language Model Serving for Local Deployment☆8,572Updated 5 months ago
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆8,867Updated last year
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆3,039Updated 3 weeks ago
- Python bindings for llama.cpp☆9,901Updated 5 months ago
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆7,521Updated this week
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.☆5,019Updated 9 months ago
- Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audi…☆9,280Updated last week
- Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization☆1,385Updated last year
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,767Updated 2 months ago
- g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains☆4,217Updated 2 weeks ago
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,716Updated last year
- Retrieval and Retrieval-augmented LLMs☆11,147Updated last month
- Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.☆2,786Updated last month
- A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.☆3,766Updated last week
- A blazing fast inference solution for text embeddings models☆4,392Updated this week
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆13,092Updated this week
- Official inference library for Mistral models☆10,619Updated last month
- Everything about the SmolLM and SmolVLM family of models☆3,552Updated last month
- Structured Outputs☆13,237Updated this week
- Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).☆7,143Updated 2 months ago
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,301Updated 8 months ago
- Optimizing inference proxy for LLMs☆3,274Updated 3 weeks ago
- A Next-Generation Training Engine Built for Ultra-Large MoE Models☆5,051Updated this week
- Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models☆2,841Updated last year
- PyTorch native post-training library☆5,642Updated this week
- Tensor library for machine learning☆13,840Updated this week