Finetune llama2-70b and codellama on MacBook Air without quantization
☆450Mar 28, 2024Updated 2 years ago
Alternatives and similar repositories for slowllama
Users that are interested in slowllama are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Llama 2 Everywhere (L2E)☆1,528Aug 27, 2025Updated 7 months ago
- Seamlessly integrate LLMs as Python functions☆2,401Mar 11, 2026Updated 2 weeks ago
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,476Mar 4, 2026Updated 3 weeks ago
- A simple "Be My Eyes" web app with a llama.cpp/llava backend☆493Nov 28, 2023Updated 2 years ago
- Turn expensive prompts into cheap fine-tuned models☆2,790May 25, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Toolkit for fine-tuning, ablating and unit-testing open-source LLMs.☆870Oct 25, 2024Updated last year
- A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for vario…☆1,050Feb 27, 2025Updated last year
- An extensible, easy-to-use, and portable diffusion web UI 👨🎨☆1,673Aug 18, 2023Updated 2 years ago
- pykoi: Active learning in one unified interface☆411Sep 24, 2025Updated 6 months ago
- Simple UI for LLM Model Finetuning☆2,060Dec 21, 2023Updated 2 years ago
- Agents Capable of Self-Editing Their Prompts / Python Code☆803Mar 15, 2024Updated 2 years ago
- Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but…☆2,043Jan 20, 2026Updated 2 months ago
- Experimentation with Streamlit for personal LLM tool☆15Jun 19, 2023Updated 2 years ago
- Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.☆867Jan 15, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆3,368Feb 25, 2024Updated 2 years ago
- Structured Outputs☆13,588Mar 21, 2026Updated last week
- Locust on k8s example for scalable load tests☆14Apr 16, 2022Updated 3 years ago
- Count Tokens of Code (forked from gocloc)☆45Aug 19, 2024Updated last year
- CodeTF: One-stop Transformer Library for State-of-the-art Code LLM☆1,479May 1, 2025Updated 10 months ago
- AutoChain: Build lightweight, extensible, and testable LLM Agents☆1,874Dec 16, 2025Updated 3 months ago
- Create and share easy-to-make, built-to-last, innovative, and customizable experiences☆34Feb 21, 2024Updated 2 years ago
- Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is al…☆111Sep 10, 2023Updated 2 years ago
- Distribute and run LLMs with a single file.☆23,909Updated this week
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆610Mar 4, 2024Updated 2 years ago
- Fine-tune LLM agents with online reinforcement learning☆1,249Mar 19, 2024Updated 2 years ago
- LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath☆9,476Jun 7, 2025Updated 9 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆263Apr 23, 2024Updated last year
- Examples in the MLX framework☆8,402Feb 12, 2026Updated last month
- ☆1,274Oct 24, 2023Updated 2 years ago
- 🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading☆10,031Sep 7, 2024Updated last year
- 💭 Chat with AI via API☆33Oct 20, 2024Updated last year
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinks☆7,209Jul 11, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- An LLM-powered advanced RAG pipeline built from scratch☆860Jan 26, 2024Updated 2 years ago
- A toolkit for applying LLMs to sensitive, non-public data in offline or restricted environments☆838Updated this week
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆4,057Jan 8, 2025Updated last year
- Finetune a LLM to speak like you based on your WhatsApp Conversations☆378May 5, 2024Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,912Sep 30, 2023Updated 2 years ago
- DataDM is your private data assistant. Slide into your data's DMs☆386Oct 6, 2024Updated last year
- An LLM-based autonomous agent controlling real-world applications via RESTful APIs☆1,395Jun 7, 2024Updated last year