mistralai / mistral-inference
Official inference library for Mistral models
☆9,921Updated 2 months ago
Alternatives and similar repositories for mistral-inference:
Users that are interested in mistral-inference are comparing it to the libraries listed below
- Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We als…☆16,133Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆36,895Updated this week
- Large Language Model Text Generation Inference☆9,710Updated this week
- tiktoken is a fast BPE tokeniser for use with OpenAI's models.☆13,255Updated 4 months ago
- Go ahead and axolotl questions☆8,484Updated this week
- Train transformer language models with reinforcement learning.☆11,140Updated this week
- LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath☆9,333Updated 6 months ago
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆8,171Updated 9 months ago
- SGLang is a fast serving framework for large language models and vision language models.☆8,748Updated this week
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinks☆6,782Updated 6 months ago
- QLoRA: Efficient Finetuning of Quantized LLMs☆10,218Updated 7 months ago
- High-speed Large Language Model Serving for Local Deployment☆8,074Updated last week
- Tensor library for machine learning☆11,760Updated this week
- Modeling, training, eval, and inference code for OLMo☆5,132Updated this week
- 🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.☆17,188Updated this week
- Python bindings for llama.cpp☆8,567Updated last week
- Inference code for CodeLlama models☆16,184Updated 5 months ago
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆11,473Updated this week
- Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"☆11,232Updated last month
- [NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.☆21,326Updated 5 months ago
- Fast and memory-efficient exact attention☆15,355Updated this week
- Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.☆10,522Updated this week
- a state-of-the-art-level open visual language model | 多模态预训练模型☆6,327Updated 8 months ago
- Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Ad…☆6,029Updated 5 months ago
- Accessible large language models via k-bit quantization for PyTorch.☆6,608Updated this week
- Tools for merging pretrained large language models.☆5,216Updated this week
- Open source codebase powering the HuggingChat app☆8,097Updated this week
- PyTorch native post-training library☆4,802Updated this week
- Letta (formerly MemGPT) is a framework for creating LLM services with memory.☆14,359Updated this week
- Inference Llama 2 in one file of pure C☆17,993Updated 6 months ago