Finetune llama2-70b and codellama on MacBook Air without quantization
☆450Mar 28, 2024Updated 2 years ago
Alternatives and similar repositories for slowllama
Users that are interested in slowllama are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Llama 2 Everywhere (L2E)☆1,528Aug 27, 2025Updated 7 months ago
- Finetune ALL LLMs with ALL Adapeters on ALL Platforms!☆332Jul 23, 2025Updated 8 months ago
- Seamlessly integrate LLMs as Python functions☆2,405Mar 11, 2026Updated last month
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,497Mar 4, 2026Updated last month
- A simple "Be My Eyes" web app with a llama.cpp/llava backend☆494Nov 28, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Turn expensive prompts into cheap fine-tuned models☆2,791May 25, 2024Updated last year
- Toolkit for fine-tuning, ablating and unit-testing open-source LLMs.☆870Oct 25, 2024Updated last year
- A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for vario…☆1,050Feb 27, 2025Updated last year
- An extensible, easy-to-use, and portable diffusion web UI 👨🎨☆1,673Aug 18, 2023Updated 2 years ago
- pykoi: Active learning in one unified interface☆411Sep 24, 2025Updated 6 months ago
- Simple UI for LLM Model Finetuning☆2,055Dec 21, 2023Updated 2 years ago
- Agents Capable of Self-Editing Their Prompts / Python Code☆806Mar 15, 2024Updated 2 years ago
- Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but…☆2,067Jan 20, 2026Updated 2 months ago
- Experimentation with Streamlit for personal LLM tool☆15Jun 19, 2023Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.☆867Jan 15, 2024Updated 2 years ago
- ☆3,363Feb 25, 2024Updated 2 years ago
- Structured Outputs☆13,657Mar 26, 2026Updated 3 weeks ago
- Locust on k8s example for scalable load tests☆14Apr 16, 2022Updated 4 years ago
- Count Tokens of Code (forked from gocloc)☆45Aug 19, 2024Updated last year
- ☆25Sep 19, 2023Updated 2 years ago
- CodeTF: One-stop Transformer Library for State-of-the-art Code LLM☆1,480May 1, 2025Updated 11 months ago
- AutoChain: Build lightweight, extensible, and testable LLM Agents☆1,877Dec 16, 2025Updated 4 months ago
- Create and share easy-to-make, built-to-last, innovative, and customizable experiences☆33Feb 21, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is al…☆111Sep 10, 2023Updated 2 years ago
- Distribute and run LLMs with a single file.☆24,205Updated this week
- ☆611Mar 4, 2024Updated 2 years ago
- Fine-tune LLM agents with online reinforcement learning☆1,251Mar 19, 2024Updated 2 years ago
- LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath☆9,474Jun 7, 2025Updated 10 months ago
- ☆1,275Oct 24, 2023Updated 2 years ago
- Examples in the MLX framework☆8,498Apr 6, 2026Updated last week
- 🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading☆10,079Sep 7, 2024Updated last year
- 💭 Chat with AI via API☆33Oct 20, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinks☆7,211Jul 11, 2024Updated last year
- An LLM-powered advanced RAG pipeline built from scratch☆857Jan 26, 2024Updated 2 years ago
- A toolkit for applying LLMs to sensitive, non-public data in offline or restricted environments☆835Mar 24, 2026Updated 3 weeks ago
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆4,068Jan 8, 2025Updated last year
- Finetune a LLM to speak like you based on your WhatsApp Conversations☆378May 5, 2024Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,915Sep 30, 2023Updated 2 years ago
- An LLM-based autonomous agent controlling real-world applications via RESTful APIs☆1,392Jun 7, 2024Updated last year