lyogavin / airllmLinks
AirLLM 70B inference with single 4GB GPU
☆6,316Updated 2 months ago
Alternatives and similar repositories for airllm
Users that are interested in airllm are comparing it to the libraries listed below
Sorting:
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,364Updated 3 months ago
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆7,265Updated this week
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,531Updated 5 months ago
- Tools for merging pretrained large language models.☆6,447Updated 2 weeks ago
- A Next-Generation Training Engine Built for Ultra-Large MoE Models☆4,988Updated this week
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.☆4,988Updated 7 months ago
- Modeling, training, eval, and inference code for OLMo☆6,124Updated 3 weeks ago
- Go ahead and axolotl questions☆10,798Updated this week
- PyTorch native post-training library☆5,584Updated last week
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,585Updated 3 weeks ago
- High-speed Large Language Model Serving for Local Deployment☆8,388Updated 3 months ago
- Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models☆2,835Updated 10 months ago
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,927Updated last week
- A blazing fast inference solution for text embeddings models☆4,201Updated this week
- Retrieval and Retrieval-augmented LLMs☆10,844Updated 3 weeks ago
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinks☆7,122Updated last year
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型☆9,487Updated last month
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,269Updated 6 months ago
- Python bindings for llama.cpp☆9,735Updated 3 months ago
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,689Updated last year
- Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization☆1,380Updated 11 months ago
- OpenChat: Advancing Open-source Language Models with Imperfect Data☆5,439Updated last year
- A lightweight framework for building LLM-based agents☆2,202Updated 3 months ago
- Enforce the output format (JSON Schema, Regex etc) of a language model☆1,952Updated 2 months ago
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali☆2,541Updated 2 weeks ago
- GPT4V-level open-source multi-modal model based on Llama3-8B☆2,423Updated 8 months ago
- An Open Large Reasoning Model for Real-World Solutions☆1,527Updated 5 months ago
- Chat language model that can use tools and interpret the results☆1,587Updated last week
- Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).☆7,108Updated 2 weeks ago
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding☆2,260Updated 5 months ago