rafacelente / bllamaView external linksLinks
1.58-bit LLaMa model
☆82Apr 3, 2024Updated last year
Alternatives and similar repositories for bllama
Users that are interested in bllama are comparing it to the libraries listed below
Sorting:
- Simple agent framework using Ollama tool calling☆10Aug 27, 2024Updated last year
- ☆18Feb 22, 2024Updated last year
- ☆15Jun 4, 2025Updated 8 months ago
- A pure and fast NumPy implementation of Mamba with cache support.☆18Jun 16, 2024Updated last year
- a lightweight, open-source blueprint for building powerful and scalable LLM chat applications☆28Jun 7, 2024Updated last year
- Download full or partial git-lfs repos without temporarily using 2x disk space☆31Oct 13, 2023Updated 2 years ago
- John Shutt's "Kernel" language implemented on ABE (C) runtime.☆13Sep 3, 2018Updated 7 years ago
- Inference deployment of the llama3☆11Apr 21, 2024Updated last year
- A thin cython wrapper around llama.cpp, whisper.cpp and stable-diffusion.cpp☆16Updated this week
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- A universal adapter including zero-copy Python bindings for Philip Turner's metal flash attention library.☆23Dec 15, 2025Updated last month
- Writing Tools, Apple's AI-inspired app, enchants Windows, enhancing your pen with AI LLMs. One hotkey press, system-wide, fixes grammar, …☆26Jul 26, 2025Updated 6 months ago
- ☆51Feb 19, 2025Updated 11 months ago
- 🌳 MCTS-inspired parallel beam search for conversation optimization. Explore multiple dialogue strategies simultaneously, stress-test a…☆35Jan 18, 2026Updated 3 weeks ago
- AI Based "Happiness Optimizer"☆12Oct 20, 2024Updated last year
- A Python implementation of an agent swarm system that works with local LLM servers. The system allows you to create multiple agents that …☆11Nov 20, 2024Updated last year
- Tiny ASIC implementation for "The Era of 1-bit LLMs All Large Language Models are in 1.58 Bits" matrix multiplication unit☆175Apr 19, 2024Updated last year
- A compressed bitset with supporting data structures and algorithms☆19Sep 13, 2013Updated 12 years ago
- flux1非官方的量化模型(flux1 unofficial quantize model)☆12Aug 14, 2024Updated last year
- AirLLM 70B inference with single 4GB GPU☆17Jun 27, 2025Updated 7 months ago
- a character-ai like UI for LLM☆10Dec 3, 2024Updated last year
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMs☆15Jul 18, 2024Updated last year
- 基于DIFY的扩展系统,可供三方应用调用。增强dify提供动态知识库检索、数据库数据查询、生成PPT等能力☆20Apr 9, 2025Updated 10 months ago
- Golang web client for Ollama, fast and easy to use.☆31Jul 18, 2025Updated 6 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆150Jan 7, 2026Updated last month
- [ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆327Nov 26, 2025Updated 2 months ago
- Red by Example - an accessible reference by example☆14Nov 29, 2022Updated 3 years ago
- 33B Chinese LLM, DPO QLORA, 100K context, AirLLM 70B inference with single 4GB GPU☆13May 5, 2024Updated last year
- Llama.cpp-qt is a Python-based GUI wrapper for the LLama.cpp server, providing a user-friendly interface for configuring and running the …☆16Oct 4, 2023Updated 2 years ago
- AI Assistant☆20Apr 18, 2025Updated 9 months ago
- Writing Extension for Text Generation WebUI☆64Aug 7, 2025Updated 6 months ago
- A TTS model capable of generating ultra-realistic dialogue in one pass.☆31May 1, 2025Updated 9 months ago
- various experiments for scaling inference time compute with small reasoning models☆17Jan 16, 2025Updated last year
- ☆17Dec 23, 2024Updated last year
- These agents work based on any local model. You ask your question and simply indicate the number of agents and experts who will answer it…☆19Feb 25, 2024Updated last year
- Entropy Based Sampling and Parallel CoT Decoding☆17Oct 9, 2024Updated last year
- Hacks for PyTorch☆19Apr 18, 2023Updated 2 years ago
- ☆163Jun 22, 2025Updated 7 months ago
- ☆11Feb 6, 2026Updated last week