dvmazur / mixtral-offloading
Run Mixtral-8x7B models in Colab or consumer desktops
☆2,297Updated 11 months ago
Alternatives and similar repositories for mixtral-offloading:
Users that are interested in mixtral-offloading are comparing it to the libraries listed below
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆2,842Updated 3 weeks ago
- Training LLMs with QLoRA + FSDP☆1,464Updated 4 months ago
- ☆2,892Updated 6 months ago
- prompt2model - Generate Deployable Models from Natural Language Instructions☆1,987Updated 3 months ago
- Tools for merging pretrained large language models.☆5,478Updated this week
- Accelerate your Hugging Face Transformers 7.6-9x. Native to Hugging Face and PyTorch.☆685Updated 7 months ago
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,342Updated last month
- ☆942Updated last month
- [ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling☆1,640Updated 8 months ago
- Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.☆5,896Updated 2 weeks ago
- ☆4,070Updated 9 months ago
- Reaching LLaMA2 Performance with 0.1M Dollars☆980Updated 8 months ago
- Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization☆1,276Updated 3 months ago
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆4,969Updated 2 weeks ago
- ☆1,577Updated 2 weeks ago
- A blazing fast inference solution for text embeddings models☆3,345Updated last week
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.☆3,007Updated 2 weeks ago
- Open-source tool to visualise your RAG 🔮☆1,119Updated 2 months ago
- A RAG LLM co-pilot for browsing the web, powered by local LLMs☆1,491Updated 2 months ago
- Llama-3 agents that can browse the web by following instructions and talking to you☆1,393Updated 3 months ago
- PyTorch native post-training library☆5,026Updated this week
- Robust recipes to align language models with human and AI preferences☆5,090Updated 4 months ago
- ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…☆2,164Updated 5 months ago
- [ICML'24] Magicoder: Empowering Code Generation with OSS-Instruct☆2,003Updated 4 months ago
- Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.p…☆1,228Updated 3 weeks ago
- 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.☆11,878Updated this week
- The hub for EleutherAI's work on interpretability and learning dynamics☆2,429Updated 2 weeks ago
- Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A☆961Updated last year
- The official implementation of Self-Play Fine-Tuning (SPIN)☆1,138Updated 10 months ago
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆3,746Updated 7 months ago