okuvshynov / slowllamaView external linksLinks
Finetune llama2-70b and codellama on MacBook Air without quantization
☆450Mar 28, 2024Updated last year
Alternatives and similar repositories for slowllama
Users that are interested in slowllama are comparing it to the libraries listed below
Sorting:
- Llama 2 Everywhere (L2E)☆1,527Aug 27, 2025Updated 5 months ago
- Seamlessly integrate LLMs as Python functions☆2,388Nov 24, 2025Updated 2 months ago
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,445Dec 9, 2025Updated 2 months ago
- Toolkit for fine-tuning, ablating and unit-testing open-source LLMs.☆870Oct 25, 2024Updated last year
- Turn expensive prompts into cheap fine-tuned models☆2,781May 25, 2024Updated last year
- A simple "Be My Eyes" web app with a llama.cpp/llava backend☆492Nov 28, 2023Updated 2 years ago
- ☆25Sep 19, 2023Updated 2 years ago
- A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for vario…☆1,044Feb 27, 2025Updated 11 months ago
- An extensible, easy-to-use, and portable diffusion web UI 👨🎨☆1,673Aug 18, 2023Updated 2 years ago
- Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but…☆2,026Jan 20, 2026Updated 3 weeks ago
- Create and share easy-to-make, built-to-last, innovative, and customizable experiences☆34Feb 21, 2024Updated last year
- Count Tokens of Code (forked from gocloc)☆44Aug 19, 2024Updated last year
- Structured Outputs☆13,403Feb 6, 2026Updated last week
- Simple UI for LLM Model Finetuning☆2,063Dec 21, 2023Updated 2 years ago
- DataDM is your private data assistant. Slide into your data's DMs☆386Oct 6, 2024Updated last year
- ☆3,372Feb 25, 2024Updated last year
- CodeTF: One-stop Transformer Library for State-of-the-art Code LLM☆1,481May 1, 2025Updated 9 months ago
- ☆605Mar 4, 2024Updated last year
- Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.☆4,039Jan 8, 2025Updated last year
- Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.☆867Jan 15, 2024Updated 2 years ago
- Running large language models on a single GPU for throughput-oriented scenarios.☆9,384Oct 28, 2024Updated last year
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinks☆7,188Jul 11, 2024Updated last year
- Go ahead and axolotl questions☆11,289Updated this week
- 🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading☆9,930Sep 7, 2024Updated last year
- Agents Capable of Self-Editing Their Prompts / Python Code☆801Mar 15, 2024Updated last year
- An Open Source text-to-speech system built by inverting Whisper.☆4,553Dec 14, 2025Updated 2 months ago
- Fine-tune LLM agents with online reinforcement learning☆1,246Mar 19, 2024Updated last year
- 💭 Chat with AI via API☆33Oct 20, 2024Updated last year
- Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is al…☆111Sep 10, 2023Updated 2 years ago
- An LLM-based autonomous agent controlling real-world applications via RESTful APIs☆1,389Jun 7, 2024Updated last year
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,911Sep 30, 2023Updated 2 years ago
- TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones☆1,307Feb 5, 2026Updated last week
- Distribute and run LLMs with a single file.☆23,704Feb 10, 2026Updated last week
- ☆1,282Oct 24, 2023Updated 2 years ago
- A toolkit for applying LLMs to sensitive, non-public data in offline or restricted environments☆811Jan 26, 2026Updated 3 weeks ago
- [EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which ach…☆5,834Oct 28, 2025Updated 3 months ago
- Serving multiple LoRA finetuned LLM as one☆1,139May 8, 2024Updated last year
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆262Apr 23, 2024Updated last year
- XmodelLM☆38Nov 19, 2024Updated last year