1.58-bit LLaMa model
☆83Apr 3, 2024Updated last year
Alternatives and similar repositories for bllama
Users that are interested in bllama are comparing it to the libraries listed below
Sorting:
- Simple agent framework using Ollama tool calling☆10Aug 27, 2024Updated last year
- ☆18Feb 22, 2024Updated 2 years ago
- ☆15Jun 4, 2025Updated 9 months ago
- A pure and fast NumPy implementation of Mamba with cache support.☆18Jun 16, 2024Updated last year
- Your personal ArXiv Feed☆23Dec 18, 2024Updated last year
- Download full or partial git-lfs repos without temporarily using 2x disk space☆31Oct 13, 2023Updated 2 years ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Oct 15, 2024Updated last year
- A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp☆16Feb 10, 2026Updated 3 weeks ago
- A universal adapter including zero-copy Python bindings for Philip Turner's metal flash attention library.☆23Dec 15, 2025Updated 2 months ago
- Inference deployment of the llama3☆11Apr 21, 2024Updated last year
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- John Shutt's "Kernel" language implemented on ABE (C) runtime.☆13Sep 3, 2018Updated 7 years ago
- ☆51Feb 19, 2025Updated last year
- Writing Tools, Apple's AI-inspired app, enchants Windows, enhancing your pen with AI LLMs. One hotkey press, system-wide, fixes grammar, …☆27Jul 26, 2025Updated 7 months ago
- 🌳 MCTS-inspired parallel beam search for conversation optimization. Explore multiple dialogue strategies simultaneously, stress-test a…☆35Jan 18, 2026Updated last month
- ☆12Feb 23, 2023Updated 3 years ago
- AI Based "Happiness Optimizer"☆12Oct 20, 2024Updated last year
- A Python implementation of an agent swarm system that works with local LLM servers. The system allows you to create multiple agents that …☆11Nov 20, 2024Updated last year
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆180Apr 19, 2024Updated last year
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Aug 30, 2024Updated last year
- AirLLM 70B inference with single 4GB GPU☆17Jun 27, 2025Updated 8 months ago
- flux1非官方的量化模型(flux1 unofficial quantize model)☆12Aug 14, 2024Updated last year
- a character-ai like UI for LLM☆10Dec 3, 2024Updated last year
- Karpathy's llama2.c transpiled to MLX for Apple Silicon☆14Dec 28, 2023Updated 2 years ago
- ☆16Dec 20, 2021Updated 4 years ago
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMs☆15Jul 18, 2024Updated last year
- 基于DIFY的扩展系统,可供三方应用调用。增强dify提供动态知识库检索、数据库数据查询、生成PPT等能力☆20Apr 9, 2025Updated 10 months ago
- A simple library for working with Hugging Face models.☆14Dec 30, 2024Updated last year
- ☆21Dec 11, 2024Updated last year
- ☆30Feb 14, 2026Updated 3 weeks ago
- Golang web client for Ollama, fast and easy to use.☆32Jul 18, 2025Updated 7 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆150Jan 7, 2026Updated 2 months ago
- Red by Example - an accessible reference by example☆14Nov 29, 2022Updated 3 years ago
- Inference Llama 2 in one file of pure C☆14Jul 24, 2023Updated 2 years ago
- ☆14Dec 6, 2023Updated 2 years ago
- Llama.cpp-qt is a Python-based GUI wrapper for the LLama.cpp server, providing a user-friendly interface for configuring and running the …☆16Oct 4, 2023Updated 2 years ago
- AI Assistant☆20Feb 21, 2026Updated 2 weeks ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆31May 1, 2025Updated 10 months ago
- Writing Extension for Text Generation WebUI☆66Aug 7, 2025Updated 7 months ago