A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
☆64Oct 13, 2023Updated 2 years ago
Alternatives and similar repositories for exllama
Users that are interested in exllama are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,924Sep 30, 2023Updated 2 years ago
- Boosting Natural Language Generation from Instructions with Meta-Learning☆11Dec 20, 2022Updated 3 years ago
- Let us make Psychohistory (as in Asimov) a reality, and accessible to everyone. Useful for LLM grounding and games / fiction / business /…☆40Apr 9, 2023Updated 3 years ago
- LLM finetuning☆42Aug 9, 2023Updated 2 years ago
- VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation [TMLR26]☆17Jun 1, 2026Updated last month
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- WebUI StartGUI is a Python graphical user interface (GUI) written with PyQT5, that allows users to configure settings and start the oobab…☆16Jun 3, 2023Updated 3 years ago
- Prototype routines for GPU quantization written using PyTorch.☆21Apr 15, 2026Updated 2 months ago
- This is an implementation of the audio source separation model as well as the evaluation metrics proposed in the paper "Weakly Informed A…☆12Nov 26, 2019Updated 6 years ago
- An insanely secure password manager.☆17Mar 10, 2026Updated 3 months ago
- An intelligent code optimization system leveraging AI analysis, automated refactoring, and test generation. Built with DSPy and Gradio, i…☆19Feb 1, 2025Updated last year
- Official code for Generative Fractional Diffusion Models☆18Jan 16, 2025Updated last year
- A trade robot on pumpfun use DeepSeek AI☆12Feb 5, 2025Updated last year
- Repository for CPU Kernel Generation for LLM Inference☆28Jul 13, 2023Updated 2 years ago
- Prototype UI for chatting with the Pygmalion models.☆238Jun 1, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆11Jan 15, 2020Updated 6 years ago
- ☆16Jun 18, 2026Updated last week
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,567Mar 4, 2026Updated 3 months ago
- dgenerate is a scriptable command line tool (and library) for generating images and animation sequences using stable diffusion and relate…☆44Oct 15, 2025Updated 8 months ago
- 🎓Automatically Update CV Papers Daily using Github Actions (Update Every 12th hours)☆12May 17, 2026Updated last month
- Extension for Text Generation Webui based on EdgeGPT, a reverse engineered API of Microsoft's Bing Chat AI☆124Oct 2, 2023Updated 2 years ago
- ☆21Sep 11, 2023Updated 2 years ago
- Tokenizer for Text to Speech (TTS) models☆14Jan 16, 2025Updated last year
- An auto save extension for text generated with the oobabooga WebUI☆26Oct 6, 2025Updated 8 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆21Mar 3, 2025Updated last year
- ☆16Aug 10, 2022Updated 3 years ago
- API for extending the Obsidian plugin Juggl☆29Nov 5, 2023Updated 2 years ago
- ☆16Feb 10, 2023Updated 3 years ago
- LLM Quantization toolkit☆20Jun 11, 2026Updated 2 weeks ago
- GPTQ inference Triton kernel☆323May 18, 2023Updated 3 years ago
- Personalized all-purpose AI assistance platform based on hierarchical cooperative multi-agent framework which utilizes websocket connecti…☆38Aug 11, 2024Updated last year
- ☆122Apr 22, 2024Updated 2 years ago
- Open Security Controls Assessment Language Toolbox☆18Apr 22, 2026Updated 2 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)☆223Feb 11, 2026Updated 4 months ago
- A curated list of my GitHub stars!☆24Updated this week
- ☆136May 26, 2026Updated last month
- ☆26Mar 6, 2024Updated 2 years ago
- SimBionic makes it possible to specify real-time intelligent software agents quickly, visually, and intuitively by drawing and configurin…☆12Nov 2, 2022Updated 3 years ago
- graph search neural network☆16Dec 8, 2018Updated 7 years ago
- Implementation of the HEX graph described in the paper of Large-Scale Object Classification using Label Relation Graphs (ECCV 2014)☆19Jul 6, 2017Updated 8 years ago